A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks

Bilal Khan; Kirk Dombrowski; Mohamed Saad

doi:10.1177/0037549714526947

. Author manuscript; available in PMC: 2015 Apr 7.

Published in final edited form as: Simulation. 2014 Apr 1;90(4):460–484. doi: 10.1177/0037549714526947

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks

Bilal Khan ¹, Kirk Dombrowski ², Mohamed Saad ³

PMCID: PMC4387577 NIHMSID: NIHMS673205 PMID: 25859056

Abstract

We describe a general framework for modeling and stochastic simulation of epidemics in realistic dynamic social networks, which incorporates heterogeneity in the types of individuals, types of interconnecting risk-bearing relationships, and types of pathogens transmitted across them. Dynamism is supported through arrival and departure processes, continuous restructuring of risk relationships, and changes to pathogen infectiousness, as mandated by natural history; dynamism is regulated through constraints on the local agency of individual nodes and their risk behaviors, while simulation trajectories are validated using system-wide metrics. To illustrate its utility, we present a case study that applies the proposed framework towards a simulation of HIV in artificial networks of intravenous drug users (IDUs) modeled using data collected in the Social Factors for HIV Risk survey.

1 Introduction

Modeling the propagation of pathogens through risk-bearing interactions of actors in a social network is an emerging perspective in epidemiology, particularly in HIV research [Goforth and Berleant, 1994, Bell et al., 2002, Goodreau, 2006]. Approaches such as these shift our view of risk away from individuals to collective social bodies as the carriers and transmitters of infection. The subject of study here then is “risk networks”, comprised of populations whose social interconnections signify particular “risk behaviors” that bear a potential for pathogen transmission. In the context of HIV, some examples of risk behaviors include social relationships which result in drug injection equipment sharing, and sexual relationships in the context of drug use. Although HIV will be used as a case study, the model presented in this paper is general enough to be applied towards the simulation of any epidemiological scenario in which disease transmission is driven by pairwise risk behaviors across a speci able set of relationship types. Risk networks are now widely recognized as critical factors in understanding infection patterns, as they define the natural environment in which risk behaviors occur, and through which the propagation of infection proceeds [Friedman et al., 1997, Bachanas et al., 2002]. The value of network-based simulation then, is that it can make the dynamic structures of risk visible and compelling [Hsieh et al., 2006], and help further a change in perspective to one that sees collectivities (and their respective forms and dynamics) as health actors with specific and identifiable structures of risk.

For reasons of cost, most risk network studies are relatively small in scale compared to the size of the overall communities they seek to understand. Even large-scale network studies manage to interview only a small portion of the ambient risk network; e.g. the study of Social Factors for HIV Risk (SFHR) conducted in Brooklyn, New York, in the early 1990s involved interviews with several hundred people [Friedman, 1999] out of the 30,000-80,000 IDUs in Brooklyn at the time. In contrast, simulation allows researchers to operate at the scale of the phenomenon of interest. While simulation is necessarily far from perfect and not a substitute for direct research, when based on detailed data and constructed to conform closely to known, short-term social dynamics, it can potentially provide suggestions and even tentative conclusions about critical health phenomenon at a time depth and social scale not possible in direct empirical research. Considerable prior work exists in which agent based modeling (ABM) is applied to questions of infectious disease epidemiology (see e.g. Nikolai and Madey [2009] for a recent review of ABM toolkits).

Most previous ABM efforts consider spatial models [Bian, 2004, Dunham, 2005, Lopez-Paredes et al., 2012, Luke et al., 2005] wherein social networks are implicit through spatial proximity; networks restructure themselves dynamically as actors move (coming in and out of pairwise contact). The EpiSimS system [Stroud et al., 2007], for example, considers social contact to define a network over which the spread of pandemics may be explored via simulation. Spatial contact-based stochastic agent models have also been used to study problems of infectious disease, including Enzootic Bovine Leukemia [Bagni et al., 2002], smallpox [Eidelson and Lustick, 2004], SARS [Huang et al., 2004], and influenza [Yoneyama and Krishnamoorthy, 2012]. ABM has even been used to evaluate the impact of the adoption of health care innovations [Dunn and Gallego, 2010], and intervention strategy efficacies [Huang et al., 2010]. One strength of explicitly spatial approaches is that the micro-level movements/behaviors of individuals drive the simulation trajectory forward over time, and the parameters specifying these behaviors can be drawn from distributions that have been calibrated to behavioral profile data collected from the population modeled. A weakness of spatial models, however, is that macro-level network characteristics—e.g. degree distribution, and triangle prevalence or “transitivity” (the latter not preserved in Markov movement paradigms)—cannot be easily controlled through the course of the simulation without impinging on actor agency (though for recent progress relating spatial models with small-world network structure see Huang et al. [2005, 2009]).

Exemplary of research efforts to generate networks having specified macro-level characteristics include the work of Hamill and Gilbert [2009], wherein artificial networks are generated to mimic structural characteristics observed in real-world social networks (e.g. sparseness, short distances, searchability, fat tails, assortativeness, transitivity, and clustering). Such efforts are part of a long line of inquiry concerning the problem of generating random networks having characteristics of social networks seen “in the wild”—see Watts and Strogatz [1998], Barabasi and Albert [1999] and Dorogovtsev and Mendes [2003] for example. The problem of generating random k-regular graphs (i.e. thefficase where the degree distribution is uniform) has been the subject of a sequence of results starting with the work of Bender and Canfield [1978], the switching process of McKay and Wormald [1990], and the configuration model [Bollobas, 1980, Bollobas, 2001]. The more difficult problem of generating random graphs satisfying a specified univariate degree distribution (over all nodes), or bivariate degree distribution (over all edges) remains a subject of ongoing inquiry. General speaking, the unbalanced (power-law) degree sequences of social networks mandate the development of inhomogeneous random graph models [Bollob as and Riordan, 2008]. The problem of efficiently generating networks with a specific non-uniform degree sequence (across all nodes) is a well-known difficult problem that has received considerable attention in recent years [Bayati et al., 2010, Blitzstein and Diaconis, 2011, Chatterjee et al., 2011], and the problem of generating graphs with a prescribed joint degree distribution has only been recently addressed by Stanton and Pinar [2011] using Markov chain techniques.

Complicating matters further is the fact that the plausibility of an artificially generated social network rests on more than merely the extent to which its univariate and bivariate degree distributions reflect those observed in the real-world population being modeled. Many other aspects of network structure might influence the likelihood of edge formation. One such example arises when individual nodes are assumed to have associated attributes (e.g. gender), since then attribute homophily may exert a bias on edge likelihoods (e.g. if same gender or opposite gender links are more predominant in the population/relationship being modeled). Another example arises in the presence of small scale structural effects like transitivity (the bias to edges forming between two individuals who share a network neighbor). To determine the extent to which edge formation is influenced by phenomena such as attribute homophily or relation transitivity, one may employ the techniques of Exponential Random Graph Modeling (ERGM), which were originally put forth by Holland and Leinhardt [1981] and Frank and Strauss [1986], with estimation questions settled recently by Snijders et al. [2006]. ERGM models of networks can be used to generate artificial networks [Goodreau, 2007, Goodreau et al., 2009, Kolaczyk, 2010, Lieberman, 2012]. As described by Goodreau, such studies can also create dynamic networks where the connections between node actors are periodically reassigned according to a given distribution of pair-wise likelihoods [Goodreau, 2011]. As such, ERGM networks can be made to “evolve” over time, though at the cost of readily controllable actor agency. One strength of ERGM simulation models then is that macro-level network characteristics (e.g. the network's instantaneous degree distribution) drive the simulation trajectory over time, and these characteristics may be calibrated against measurements of the actual population being modeled. A weakness of ERGM simulation models, on the other hand, is that the micro-level behaviors of individuals (implicit in edge restructuring) cannot be readily controlled and made to reflect the known behavioral profiles exhibited in the population being modeled. Indeed, when discussing their future efforts, Snijders et all refer to the need for “stochastic actor-based models for network dynamics” [Snijders et al., 2010]. As Snijders describes it, networks gain their dynamism as actors come and go from the network, and when they change their mutual connections due to:

... the structural positions of the actors within the network—e.g., when friends of friends become friends—characteristics of the actors (“actor covariates”), characteristics of pairs of actors (“dyadic covariates”), and residual random influences representing unexplained influences [Snijders et al., 2010, p. 44].

What is needed is a framework in which macro-level network characteristics and individual micro-level behavioral profiles both play a role. Such a framework is developed and presented here, specifically for application towards the study of disease epidemiology. The framework is designed with the following guiding principles in mind:

like ABM-based simulations, we want to maintain an actor-based environment, where actions which determine network dynamism originate in characteristics of the nodes themselves. Such actor-based dynamism should include risk behaviors, length of network participation, and when and how to establish new network connections or get rid of prior ones.
like ERGM-based simulations, we want link creation to reflect node-specific attributes (such as gender, age, ethnicity/race), and local structural tendencies (e.g. network transitivity), so that our dynamic network remains “real-world viable” over long simulation trajectories, even while individual nodes/actors enter or leave the network.
like ABM-based simulations, we want our actors to exhibit individual behavior patterns (beyond node-specific characteristics such as age, or gender) parametrized from a distribution of possibilities, and to allow for different modalities of participation (such as one might see from two very different classes of network actors who are otherwise indistinguishable on the basis node characteristics).
like ERGM-based models, we want to be able to control for network-level factors affecting overall network dynamism, such as bounded deviation from a specific network-wide degree distribution. And finally,
like ABM-based simulations, we want to be able to simulate large networks, to determine whether factors of scale influence network dynamics and infection trajectories over simulation time, and to examine simulation results on a scale of the phenomena of interest.

Towards this, Marshall et al. [2012] have recently made progress by demonstrating an ABM approach that considers both macro-level network characteristics and individual micro-level behavioral profiles, in the context of their work on HIV interventions in IDU risk networks. In this paper we present a case study that extends the approach of Marshall et al., providing an illustrative application of our general-purpose framework for epidemiological modeling which considers multi-pathogen multi-layer networks by synthesizing both ERGM and ABM approaches. A more detailed comparison of features is given as part of the case study (see Section 5, pp.20).

The framework is presented in stages. We begin, in Section 2, by considering static risk networks. First, in Subsection 2.1, we describe how a population survey can be used to obtain a description of a concrete real-world risk network, and how from this one may determine which attributes (of individuals) exert the greatest influence on the formation of risk relationships. In Subsection 2.2, we present the derivation of an (m, l, p) statistical network model — in effect, a distribution over the space of all static networks, which reflects the properties of the concrete risk network being modeled. In Subsection 2.3, we discuss how one may use a statistical network model to generate new (static) artificial risk networks. Finally, in Subsection 2.4, we address the need to validate generated artificial risk networks against the original real-world risk network from which the generative model was distilled. To simulate infection across these networks, the framework is extended in Section 3 to permit each of p distinct types of pathogens to flow between individuals via any of the l different types of risk relationships. Lastly, in Section 4, network dynamism is captured via node arrival and departure processes, incremental changes to individual risk relationship structures, and node aging. Fifinally, to illustrate the efficacy of the proposed framework, we apply it towards a case study of HIV in injection drug user (IDU) communities where drug equipment sharing and sex in the context of drug use are the principal risk events underlying the transmission of HIV, the case study and associated simulation results are presented in Section 5.

Throughout this exposition, we adhere to certain notational conventions. Sets will be denoted by capital letters, A, B, C, etc., and will be indexed by integer variables i, j, k, etc. Elements within sets will be written in lower case Roman letters, a, b, c, etc. Distributions will usually be expressed as α, β, γ, etc. Types or proper names will be represented in script $A, B, C$ , etc. In an exposition where a set or function, (e.g. the actors V ) must be considered time-dependent, the temporal index will appear as a superscript, (causing us to write V^t for the set of actors at time t). In situations where a set or function (e.g. the infectiousness curve I) is being seen in the context of a particular layer, attribute, or pathogen type, this dependency will be made clear in the subscript of the variable, (e.g. I_j,k is the infectiousness curve of pathogen k via risk acts in layer j). Both superscripts (for time) and subscripts (for context) will be employed simultaneously when referencing sets or functions that are both time and context dependent (e.g. $N_{j}^{t} (v)$ is the set of neighbors of actor v within network layer j at time t). A function f whose domain is D and range is R will be declared so by the statement f : D → R. Set differences are indicated using the \ operator.

2 Modeling network structure

We view a risk network as an l-layer combinatorial fabric, weaving together a set of n individuals, each of whom has m attributes, and may host one or more of p distinct types of pathogens. In what follows, we present how a real-world risk network is described (2.1), modeled statistically (2.2), and how the statistical network model can be subsequently used to sample new artificial risk networks (2.3) that can be validated against the original real-world network (2.4).

2.1 Obtaining data on real-world risk networks

In a survey of a population V, each constituent individual v is interrogated regarding a fixed set of m attributes X = {x₁,. . . x_m}, e.g., x₁ could be gender, while x₂ might be age, etc. We assume that each variable x_i (for i = 1,. . ., m) is categorical, taking values from a nite set U_i that is known in advance (e.g. U₁ could be {Male, Female}, while U₂ might be 21AndUnder, Over21}). Each node attribute x_i (i = 1,. . ., m) is seen as a function x_i : V → U_i.

Yet to model a risk network, the survey must go beyond individual attributes and collect data on the risk relationships between them. The relationships of interest might be of several concrete types $I_{1}, I_{2}, \dots, I_{l}$ . For example, $I_{1}$ might be the relationship of “sharing injection equipment with”, while $I_{2}$ relationships could embody “sexual partnership”, etc. In practice, during the survey, each individual v from V is questioned about their risk relationships for each type $I_{j}$ (for j = 1,. . ., l), and is asked to provide sufficient information with which to identify the individuals N_j(v) ⊆ V with whom v enjoys a type $I_{j}$ relationship. In other words, the survey must be capable of capturing ego network data that, in turn, can be aggregated to produce a network whose structural features are representative of the topological characteristics of the risk network as a whole. The data thus collected is used to define a degree $d_{j} (v) \overset{def}{=} ∣ N_{j} (v) ∣$ for each individual v, the value of this quantity is the number of $I_{j}$ relationships that v has. The set of all pairwise relationships, at each network layer j = 1,. . ., l is then expressible as $E_{j} = ⋃_{v \in V} N_{j} (v)$ .

Finally, the survey must produce data on the prevalence and distribution of the pathogens of interest, which may be of several distinct types $P_{1}, P_{2}, \dots, P_{p}$ . Specifying the instantaneous state of a risk network thus requires a collection of p concrete sets of individuals A₁, A₂,. . ., A_p ⊆ V where A_k is the set of individuals who are positive for pathogen type $P_{k}$ (k = 1, 2,. . ., p).

Collecting the above elements, we define a risk network to be an (m + 2l + p + 1) tuple $D \overset{def}{=} (x_{i}, V, E_{j}, A_{k}, d_{j})$ where i = 1,. . ., m, j = 1,. . ., l, k = 1,. . ., p.

2.2 Defining a statistical network model

In modeling a risk network $D$ , the question arises as to the contents of the model, and particularly, which m attributes X = {x₁,. . . x_m} to consider. Questions of determinate variables (and their relative importance), are of paramount importance to our modeling process. In particular, we need to determine which individual attributes were important to the formation of the network, and to know how these attributes rank relative to one another.

Recently, the statistical analysis of network data has been advanced considerably by the introduction of Exponential Random Graph Modeling (ERGM), which provides researchers with an alternative to the simple cross-tabulation of network link data. ERGM is a statistical technique aimed at determining the extent to which the likelihood of network linkages appears to be biased towards (or against) the creation of specified network substructures, above and beyond what is expected by chance occurrence. Such substructures can be as simple as the tendency of “like” nodes to be connected (at a greater rate than expected by a random distribution of connections), or as complex as specific structures of connection between several individuals [Bearman et al., 2004]. The theoretical basis for ERGM analysis was laid down by Holland and Leinhardt [1981] and Frank and Strauss [1986], with estimation questions finally settled only recently [Snijders et al., 2006]. Readers can find a detailed exposition of ERGM in Goodreau [2007], Goodreau et al. [2009], Kolaczyk [2010].

To begin the process of parameterizing the models, we apply ERGM analysis to the data set $D$ obtained from the survey. The outcome of such analysis is the m most influential attributes X = {x₁,. . . x_m}, together with weights that quantify their relative influence. In addition, we use ERGM to evaluate the influence of significant network substructures. These can be as simple as edge reciprocity or network transitivity, or as complex as those discussed by Bearman and colleagues [Bearman et al., 2004]. Here we consider only the influence of triadic closure (i.e. transitivity) on link formation in each of the l network layers, these l weights are denoted $w_{j}^{Δ}$ (for j = 1,. . ., l). Triadic closure was found to be in the SFHR network data on which the case study of our framework is based. SFHR considered risk relationships between intravenous drug users which were based on equipment co-use, and the stigmatized nature of the pairwise relationships clearly leads to a bias towards triad formation (if A co-uses with B and C, then B and C are more likely to co-use together). In other networks, such as those where the edge relation signifies sexual intercourse, no bias towards transitivity is seen. ¹

Attribute Distributions. Given a risk network $D = (x_{i}, V, E_{j}, A_{k}, d_{j})$ where i = 1,. . ., m; j = 1,. . ., l, k = 1,. . ., p, we can from each of the attributes x_i (i = 1,. . ., m), determine a univariate attribute distribution α_i : U_i → [0, 1] for i = 1, 2,. . ., m, where for u ∈ U_i,

α_{i} (u) \overset{def}{=} \frac{1}{∣ V ∣} ∣ x_{i}^{- 1} (u) ∣ .

If a chi-squared test reveals a significant level of association to be present between α_i and α_i′ (for i ≠ i′), then the categorical attribute variables x_i and x_i′ are coalesced into a new joint variable x* defined over the Cartesian product of categorical spaces U_i × U_i′. The joint distribution α* over a suitably binned U_i × U_i′ is used whenever we need to sample a pairs of values (x_i, x_i′). In this manner, we may inductively coalesce all attributes which show significant pairwise dependencies. Given this strategy, in what follows we simplify the exposition by assuming that {α_i | i = 1,. . ., m} is a set of pairwise independent distributions.

Next, each set of type- $I_{j}$ relationships E_j (for j = 1, 2,. . ., l) is used to define a bivariate attribute distribution β_i,j : U_i × U_i → [0, 1] for i = 1, 2,. . ., m, where for each u₁, u₂ ∈ U_i,

β_{i, j} (u_{1}, u_{2}) \overset{def}{=} \frac{1}{∣ E_{j} ∣} ∣ (x_{i}^{- 1} (u_{1}) \times x_{i}^{- 1} (u_{2})) \cap E_{j} ∣,

Degree Distributions. Next, we model the layer-j degree distribution for the population, taking care to account for the fact that individual attributes and degrees are often related.² To capture this, we determine a suitable partition of the Cartesian product of the categorical spaces U_i (i = 1,. . ., m)

C_{1}^{j} ⊔ C_{2}^{j} ⊔ \dots ⊔ C_{s}^{j} ⊔ \dots ⊔ C_{S_{j}}^{j} = \prod_{i = 1}^{m} U_{i} .

Informally, each of the $C_{s}^{j} (s = 1, \dots, S_{j})$ represents a distinct class of individuals, where classes are differentiated from one another because they exhibit different “ideal layer-j degree” distributions. In practice, the value of S_j and the definition of each class $C_{s} (S = 1, \dots, S_{j})$ is determined by performing a statistical analysis to discover which univariate attributes appear to significantly influence vertex degree (in layer j). From the results of such an analysis, classes are suitably defined so that the individuals within a single class can be assumed to draw their ideal layer-j degree from a distribution that is independent of their individual attribute values.

Since the set ${C_{s}^{k} ∣ s = 1, \dots, S_{j}}$ is a partition of $\prod_{i = 1}^{m} U_{i}$ , and every v ∈ V has attributes (x₁(v), x₂(v),. . ., x_m(v)) which lie in exactly one of the S_j classes, we obtain a natural classification function $f_{j}^{C} : V \to {1, 2, \dots, S_{j}}$ . Given such a classification function, the individuals V in a risk network can be naturally partitioned by class:

V (C_{1}^{j}) ⊔ V (C_{2}^{j}) ⊔ \dots ⊔ V (C_{s}^{j}) ⊔ \dots ⊔ V (C_{S_{j}}^{j}) = V

where $V (C_{s}^{j}) \overset{def}{=} {v \in V ∣ f_{j}^{C} (v) = s}$ .

Each layer-j degree class $C_{s} (s = 1, \dots, S_{j})$ exhibits its own univariate degree distribution $χ_{j; s} : Z \times Z \to R$ where for every pair of integers a < b, and s ∈ {1,. . ., S_j}:

χ_{j; s} (a, b) \overset{def}{=} \frac{∣ {v \in V (C_{s}^{j}) ∣ a \leq d_{j} (v) < b} ∣}{∣ V (C_{s}^{j}) ∣} .

In general, the layer-j degree distributions χ_j;s(a, b) for different classes s may differ from one another, and may differ from the overall “class-neutral” layer-j degree distribution

χ_{j; *} (a, b) \overset{def}{=} \frac{∣ {v \in V ∣ a \leq d_{j} (v) < b} ∣}{∣ V ∣} .

If a chi-squared test reveals a significant level of association to be present between the class-neutral degree distributions of two layers, say χ_j;* and χ_j′;*, (for j ≠ j′), then the degree distributions of the two layers j and j′ must be coalesced into a new joint variable χ* over the Cartesian product $N \times N$ , using a common refinement of the classification schemes ${C_{s} ∣ s = 1, \dots, S_{j}}$ and ${C_{s}^{'} ∣ s = 1, \dots, S_{j^{'}}}$ . The distribution χ* is used to simultaneously sample a pair of degrees (for network layers j and j′). In this manner, we may (for the purpose of degree sampling) inductively coalesce all network layers which show significant pairwise dependency in their degree distributions. Given this strategy, without loss of generality, in what follows we simplify the exposition by assuming that the set {χ_j;* | j = 1,. . ., l} consists of pairwise independent distributions. The degree distribution in layer j is captured by the set of pairs

X_{j} \overset{def}{=} {(C_{s}^{j}, χ_{j; s}) ∣ s = 1, 2, l d o t s, S_{j}}

which functiofinally specifies a distribution for each of the classes in ${C_{s} ∣ s = 1, \dots, S_{j}}$ .

For each j = 1,. . ., l we also define a bivariate degree distribution $\overset{=}{χ_{j}} : {(Z \times Z)}^{2} \to R$ where for every 4-tuple of integers a < b, a′ < b′

\overset{=}{χ_{j}} (a, b, a^{'}, b^{'}) \overset{def}{=} \frac{∣ {e \in E_{j} ∣ e = (u, v); a \leq d_{j} (u) < b; a^{'} \leq d_{j} (v) < b^{'}} ∣}{∣ E_{j} ∣} .

Pathogen Distributions. For each of the p pathogen types $P_{k}$ (k = 1, 2,. . ., p), we model its prevalence, taking care to account for the fact that individual attributes and pathogen prevalences are often related.³ To capture this, we determine a suitable partition of the cartesian product of the categorical spaces U_i (i = 1,. . ., m)

B_{1}^{k} ⊔ B_{2}^{k} ⊔ \dots ⊔ B_{r}^{k} ⊔ \dots ⊔ B_{R_{k}}^{k} = \prod_{i = 1}^{m} U_{i} .

Informally, each of the $B_{r}^{k} (r = 1, \dots, R_{k})$ represents a distinct class of individuals, where classes are differentiated from one another because they exhibit different“pathogen-k prevalence” levels. In practice, the value of R_k and the definition of each class $B_{r} (r = 1, \dots, R_{k})$ is determined by performing a statistical analysis to discover which univariate attributes appear to significantly influence pathogen prevalence (with respect to pathogen k). From the results of such an analysis, classes are suitably defined so that the individuals in a single class can be assumed to draw their pathogen-k infection status via a Bernoulli trial whose outcome is positive with a constant probability that is independent of the individual's attributes.

Since the set ${B_{r} ∣ r = 1, \dots, R_{k}}$ is a partition of $\prod_{i = 1}^{m} U_{i}$ , and every v ∈ V has attributes (x₁(v), x₂(v),. . ., x_m(v)) which lie in exactly one of the R_k classes, we obtain a natural classification function $f_{k}^{B} : V \to {1, 2, \dots, R_{k}}$ . Given such a classifying function, the individuals V in a risk network may be partitioned by class:

V (B_{1}^{k}) ⊔ V (B_{2}^{k}) ⊔ \dots ⊔ V (B_{r}^{k}) ⊔ \dots ⊔ V (B_{R_{k}}^{k}) = V

where $V (B_{r}^{k}) \overset{def}{=} {v \in V ∣ \int_{k}^{B} (v) = r}$ .

Each pathogen-k prevalence class $B_{r} (r = 1, \dots, R_{k})$ exhibits its own pathogen prevalence $p_{k; r} \in R$ where for every k = 1, 2,. . ., p, and r ∈ {1,. . ., R_k}:

p_{k; r} \overset{def}{=} \frac{∣ A_{k} \cap V (B_{r}^{k}) ∣}{∣ V (B_{r}^{k}) ∣} .

where

A_{k} \overset{def}{=} {v \in V ∣ v is positive for pathogen k} .

The prevalence of pathogen k is captured by the set of pairs

P_{k} \overset{def}{=} {(B_{r}^{k}, p_{k; r}) ∣ r = 1, 2, \dots, R_{k}}

which functionally specifies a distribution for each of the classes in ${B_{r} ∣ r = 1, \dots, R_{k}}$ .

The statistical network model $M (D)$ of risk network $D$ is the (m + (m + 2)l + p)-tuple:

M (D) \overset{def}{=} (α_{i}, β_{i, j}, X_{j}, {\overset{=}{χ}}_{j}, P_{k}) i = 1, \dots, m; j = 1, \dots, l; k = 1, \dots p

2.3 Generating networks from a statistical network model

Given a statistical network model $M$ , procedure MakeNetwork (Listing 1) instantiates a new artificial risk network of size n, using $M$ as a statistical guideline.

In the first phase (line 1 of Listing 1), the MakePopulation procedure is called (Listing 2), which in turn, creates n individuals, assigning each of their m attributes independently at random, using the univariate distributions α₁,. . ., α_m (lines 4,5). Then (lines 7-10) the degree distributions $X_{j}$ are used to assign each individual an ideal ego network size, or ideal degree, d_j(v), based on v's ideal layer-j degree class s, for each of the layers j = 1,. . ., l. Justification for individuals having an intrinsic ideal degree comes from prior work on the emergence of “roles” within risk networks [Friedman et al., 1998, Curtis et al., 1995, Romero-Severson et al., 2012].

In the second phase (line 2 of Listing 1), the MakePathogens procedure is called (see Listing 3), which in turn, distributes each of the p types of pathogens (line 2) to each of the individuals in V (line 3), in a manner that reflects the specified prevalence levels for the particular pathogen type (lines 4-7), based on v's pathogen-k prevalence class t, for each of the pathogen k = 1,. . ., p.

In the third phase (line 3 of Listing 1), the MakeRelations procedure creates risk relationships between individuals (see Listing 4). To do this, it initializes the layer j neighbors (line 3) of each node v_i (line 2) to be the empty set (line 4), and then schedules d_j(v_i) executions of AddEdge for each node v_i at each layer j (lines 5-6). Because all calls to AddEdge are at times < 1, MakeRelations need only wait until time 1 before aggregating the set of all edges (line 8).

Each execution of AddEdge is in the context of a given vertex v, layer j, and time t (Listing 5). The procedure first computes the set $C_{j}^{t} (v)$ of candidate new layer-j neighbors of v (line 2), proceeding only if this is nonempty (line 3). It then (i) computes the layer j edge deficit for each candidate vertex c (line 5), taking this to be the difference between v's ideal degree d_j(v) and actual degree $∣ N_{j}^{t} (v) ∣$ , rescaled into the interval [0, 1] by composing with the smooth squashing function $e^{\frac{- 1}{x}}$ . The squashing function approaches 1 as x → ∞ and 0 as x → 0⁺. The quantity a_δ(c) is thus close to 1 whenever $∣ N_{j}^{t} (c) ∣ ≪ d_{j} (c)$ and becomes 0 once c's actual degree $∣ N_{j}^{t} (c) ∣$ attains its ideal value d_j(c). The selection of candidate c is also influenced by (ii) the actual degrees of v and c (line 6), with respect to the bivariate degree distribution $\overset{=}{χ_{j}}$ (suitably binned to 2∊-sized buckets). Likewise (iii) the joint attributes of v and c influence the candidate selection (line 7), reflecting the bivariate attribute distributions β_i,j. Finally, (iv) each new triangle arising from the addition of edge (v, c) contributes $(w_{j}^{Δ} - 1)$ to the total triadic bias (line 8) which is accumulated in $a_{Δ}^{t} (c)$ . The factors (i)-(iv) are used to construct a probability distribution $p_{j}^{t}$ over the set of candidate new layer-j neighbors (lines 9,10), using which one of the candidates w is selected (line 11). The edge (v, w) is then added to network layer j by augmenting the set of layer j edges emanating from v (line 13).

2.4 Validating generated networks

We have shown how, from a network survey, one may specify a real-world risk network $D$ (see Section 2.1), and from $D$ derive a statistical network model $M$ (see Section 2.2), and then use the model $M$ to sample new artificial risk networks $D_{1}^{'}, D_{2}^{'}, D_{3}^{'}, \dots$ (see Section 2.3). We now present techniques to quantify the divergence between the original real risk network $D$ and a generated artificial risk network(s) $D^{'}$ . These techniques shall be particularly relevant to assessing the possible degeneracy of model $M$ , i.e. its potential inability to generate networks that reflect characteristics of the network from which the model was derived.

We begin by considering how one may measure the similarity or difference between two (m, l, p) statistical network models

\begin{matrix} M_{1} = & {(α_{i}, β_{i, j}, X_{j}, \overset{=}{χ_{j}}, P_{k})}_{i = 1, \dots, m; j = 1, \dots, l; k = 1, \dots, p} \\ M_{2} = & {(α_{i}^{'}, β_{i, j}^{'}, X_{j}^{'}, \overset{=}{χ_{j}^{'}}, P_{k}^{'})}_{i = 1, \dots, m; j = 1, \dots, l; k = 1, \dots, p} . \end{matrix}

Because $M_{1}$ and $M_{2}$ each consist of a set of distributions, the two models are readily comparable only if the domains of these distributions agree. In particular, for two models to be comparable it is necessary that

\begin{matrix} D o m a i n (α_{i}) = D o m a i n (α_{i}^{'}) \\ D o m a i n (β_{i, j}) = D o m a i n (β_{i, j}^{'}) \end{matrix}

for all i = 1,. . ., m, j = 1,. . ., l. Likewise, the set of ideal layer-j degree classes (referred to via $X_{j}$ and $X_{j}^{'}$ ), and pathogen-k prevalence classes (referred to via $P_{k}^{'}$ and $P_{k}^{'}$ ) must be compatible:

\begin{matrix} D o m a i n (X_{j}) = D o m a i n (X_{j}^{'}) \\ D o m a i n (P_{k}) = D o m a i n (P_{k}^{'}) . \end{matrix}

for all j = 1,. . ., l, k = 1,. . ., p. The above conditions can be met by any pair of (m, l, p) statistical network models by using a suitable common refinement of the categorical spaces U_i and $U_{i}^{'}$ (for i = 1,. . ., m), classifications $X_{j}, X_{j}^{'}$ (for j = 1,. . ., l), and $P_{k}, P_{k}^{'}$ (for k = 1,. . ., p).

Given two comparable models how might one quantify their similarity or difference?

Since statistical network models are tuples of distributions, we begin by considering how one may assess the similarity between two probability distributions f, f′ over a common set X. Many approaches exist, including histogram intersection [Barla et al., 2003], Chi-square statistic [Read, 1993], quadratic form distance, match distance, Kolmogorov-Smirnov distance [Stephens, 1974], earth mover's distance, Kullback-Leibler divergence—sometimes now called information divergence, information gain, relative entropy—see Kullback and Leibler [1951], and Jensen-Shannon divergence—also known as information radius or IRad. Here we shall chose to measure the difference between two probability distributions as

Δ (f, f^{'}) \overset{def}{=} \sqrt{I R a d (f, f^{'})}

because by doing so, we obtain a metric space on the set of all probability distributions over an underlying set [Endres and Schindelin, 2003, Österreicher and Vajda, 2003]. The IRad of two distributions is defined to be their mean Kullback-Leibler divergence from their average (as distributions). Applying this to the constituent distributions in the two models, we get

\begin{matrix} Δ (α_{i}, α_{i}^{'}) & \overset{def}{=} \sqrt{\sum_{x \in U_{i}} a_{i} (x) \log \frac{α_{i} (x)}{{\overset{‒}{α}}_{i} (x)} + α_{i}^{'} (x) \log \frac{α_{i}^{'} (x)}{{\overset{‒}{α}}_{i} (x)}} \\ Δ (β_{i, j}, β_{i, j}^{'}) & \overset{def}{=} \sqrt{\sum_{(x, y) \in U_{i} \times U_{i}} β_{i, j} (x, y) \log \frac{β_{i, j} (x . y)}{{\overset{‒}{β}}_{i, j} (x, y)} + β_{i, j}^{'} (x, y) \log \frac{β_{i, j}^{'} (x, y)}{{\overset{‒}{β}}_{i, j} (x, y)}} \end{matrix}

where ${\overset{‒}{α}}_{i}$ is the average of α_i and $α_{i}^{'}$ (as a distribution over U_i), and ${\overset{‒}{β}}_{i, j}$ is the average of β_i,j and $β_{i, j}^{'}$ (as a distribution over U_i × U_i). Analogous distance measures may be defined between the two models’ bivariate degree distributions $\overset{=}{χ_{j}}$ and $\overset{=}{χ_{j}^{'}}$ , as well as between corresponding in-class univariate degree distributions χ_j;s and $χ_{j; s}^{'}$ (taken from $X_{j}$ and $X_{j}^{'}$ , respectively). Because these distributions are defined over identical partitions ${C_{s}^{j} ∣ s = 1, \dots, S_{j}}$ of $\prod_{i = 1}^{m} U_{i}$ we can aggregate the distances by summing the divergences between corresponding class distributions:

Δ (X_{j}, X_{j}^{'}) \overset{def}{=} \sum_{s = 1}^{S_{j}} Δ (χ_{j}, χ_{j}^{'}) .

Having defined the distance between corresponding distributions in the two models, we use the L_∞ norm to extend to a de nition of distance between statistical network models:

Δ^{*} (M_{1}, M_{2}) \overset{def}{=} \max (\max_{i = 1}^{m} Δ (α_{i}, α_{i}^{'}), \max_{j = 1}^{l} \max_{i = 1}^{m} Δ (β_{i, j}, β_{i, j}^{'}), \max_{j = 1}^{l} Δ (X_{j}, X_{j}^{'}) \max_{j = 1}^{l} Δ (\overset{=}{χ_{j}}, \overset{=}{χ_{j}^{'}},)

(1)

By considering the worst-case divergences of all constituent distributions within the two models, we hope to produce a holistic assessment of the relative validity of each model against the other [Bharathy and Silverman, 2013].

The distance between two risk networks $D$ and $D^{'}$ (which have comparable models), is now taken as

Δ (D, D^{'}) \overset{def}{=} Δ^{*} (M (D), M (D^{'})) .

(2)

Note that by not incorporating divergences of pathogen prevalence rates p_k (interpreted as Bernoulli distribution parameters) into the de nition of Δ*, we ensure that $Δ (D, D^{'})$ measures the extent to which $D, D^{'}$ differ as networks, the pathogen prevalence rates in the two risk networks $D, D^{'}$ may diverge arbitrarily without influencing the value of $Δ (D, D^{'})$ .

We have thus transformed the set of all risk networks generated from comparable statistical network models into a metric space in which distance is inverse to similarity in network structure. In practice, the metric Δ (on risk networks) will allow us to detect when an (artificial) generated network $D^{'}$ is very different from the surveyed (real-world) risk network $D$ from which the generative statistical network model $M (D)$ was defined. By using rejection sampling techniques [Robert and Casella, 2005] we may ensure that the artificial networks which are used as starting points of our simulation are not exceptionally different from the real-world networks from which our statistical network models are derived.

In the context of dynamic network simulation (to be described), Δ will allow us to keep track of the extent of structural divergence between the initial artificial risk network $D^{'}$ and its instantaneous evolute $D_{t}^{'}$ at time t (over the course of the simulation trajectory). If at some point (in time) in the simulation trajectory, we discover that $D^{'}$ and $D_{t}^{'}$ are significantly different (i.e. $Δ (D^{'}, D_{t}^{'})$ exceeds some prescribed threshold), the de nition of Δ permits us to dissect the contributions of the constituent distributions in $M (D_{t}^{'})$ and $M (D^{'})$ to determine what aspects of the models are most responsible for the divergence. In such circumstances, either the trajectory can be discarded (because it has produced an exceptional network)—a form of rejection sampling from the space of all system trajectories, or, alternatively, the dynamism model parameters (to be described) may be altered to allow less drift in the risk network's structure over time.

Next, in Section 3, we shall extend the model to support pathogen dynamics. Then, in Section 4, we extend it further to capture the dynamic evolution of network topologies.

3 Modeling pathogens: the risk process

An individual v's infection status may change with respect to a pathogen $P_{k}$ (for some k in 1,. . ., p) when v engages in risk behaviors (via layer j relationships, j = 1,. . ., l) with a risk partner w who is positive for $P_{k}$ . We refer to aspects of the framework which speak to such events, as the risk process for pathogen $P_{k}$ , the details of which are described in what follows. While the description is from the vantage point of a fixed layer j and time t, it applies to all layers j = 1,. . ., l at all times t > 1. At a given time t, each individual is represented as a node v ∈ V^t within a network

D^{t} \overset{def}{=} (x_{i}^{t}, V^{t}, E_{j}^{t}, A_{k}^{t},, d_{j})_{i = 1}, \dots, m; j = 1, \dots, l; k = 1, \dots, p .

The set $N_{j}^{t} (v) \subseteq V^{t}$ represents the potential layer j risk partners for v within a fixed temporal window of duration Θ_j, i.e. during the time [t, t + Θ_j). Typically, Θ_j is related to the definition of edge relation in the survey, e.g. in SFHR, subjects were asked for the number of risk behavior partners they had in the past 30 days so Θ₁ would be taken as 1 month.

Individual v has the propensity to sporadically engage in risk acts across layer j of their network, with a partner randomly chosen from $N_{j}^{t} (v)$ . In anticipation of this, when a node v first enters the network, we assign it a propensity $r_{j}^{R} (v)$ for risk activities in layer j. This number is assumed to be time-invariant for each individual, and is randomly chosen from the positive reals using a truncated Gaussian [Robert, 1995] with (time-invariant) mean $μ_{j}^{R}$ and (time-invariant) standard deviation $σ_{j}^{R}$ . A Gaussian distribution was adopted in order to allow for controllable variation (across individuals) in the appetite for risk acts (per network layer j). The selection of $t_{j}^{R} (v)$ occurs independently for all individuals v and layers j. The quantity $t_{j}^{R} (v) ∕ ∣ N_{j}^{t} (v) ∣$ represents the expected time between successive layer j risk impulses experienced by v. Statistically speaking, one may say that on average, every $t_{j}^{R} (v)$ months, individual v is expected to have engaged in roughly $∣ N_{j}^{t} (v) ∣ \approx d_{j} (v)$ risk events via layer j edges. Following previous work on the outcomes of HIV transmission in the context of unsafe sex, risk impulse streams are generated by independent Poisson processes operating at each individual v [Barta et al., 2010, Xia et al., 2012]. To achieve the above characteristics (regarding mean times between impulses) in a memoryless fashion, the time between successive risk impulses follows an exponential distribution with rate $∣ N_{j}^{t} (v) ∣ ∕ t_{j}^{R} (v)$ . Upon experiencing a layer j risk impulse at time t, node v selects a partner w uniformly atj random from its layer j neighbors $N_{j}^{t} (v)$ , and engages in a mutual layer j risk act with w. In applications where Poisson processes are not good models of risk impulse streams, a different class of stochastic processes could be instrumented at each node, with $∣ N_{j}^{t} (v) ∣ ∕ t_{j}^{R} (v)$ serving as an parameter regulating the process’ intensity.

During a layer j risk act involving v and w, one or more of the pathogens $P_{k}$ (for k = 1,. . ., p) may propagate. The likelihood of this is taken to be 0 if both individuals have the same infection status, i.e. when both $v, w \notin A_{k}^{t}$ or when both $v, w \in A_{k}^{t}$ . If the individuals are serodiscordant with respect to pathogen $P_{k}$ (i.e. precisely one of them is infected), then the probability of transmission is modeled using an infectiousness curve I_j,k. For concreteness of exposition, let's assume v is positive for pathogen $P_{k}$ while w is not. The infectiousness curve I_j,k then maps the age of v's infection (with respect to $P_{k}$ ) to the probability of the pathogen's transmission during a layer j risk act. To support this within the model, it is necessary for the risk network representation to be augmented so as to maintain information about the time when individuals first become positive for each pathogen $P_{k}$ . We record this information via p functions $t_{k}^{+} : A_{k}^{t} \to R$ (for k = 1,. . ., p). Fifinally, the susceptibility of w to becoming infected by pathogen k may be impacted by the infection status of w with respect to another pathogen k′ ∈ {1,. . ., p} where k′ ≠ k. We capture this via a scalar susceptibility multiplier γ_k,k′ ∈ [0, + ∞) which amplifies or dimishes the transmission likelihood mandated the infectiousness curve. ⁴ Aggregating these factors, we get that the probability of w becoming infected by v during a single layer j risk act involving the pair (v, w) is

\max (0, \min (1, I_{j, k}^{+} (t_{k}^{+} (v)) \cdot \prod_{w \in A_{k^{'}}} γ_{k, k^{'}}))

While it is easy to update $t_{k}^{+}$ during the course of a trajectory (i.e. as previously uninfected nodes acquire the pathogen), we must also specify the infection times for individuals who were chosen to be infected at the very outset of the simulation, i.e. in the MakePathogens procedure (see Listing 3). We do this for each v in V¹ by initializing $t_{k}^{+} (v)$ to a value selected uniformly at random from the interval $[1 - T_{k}^{+}, 1]$ , the values $T_{k}^{+}$ are new model parameters (k = 1,. . ., p).

The 2l parameters $μ_{j}^{R}$ and $σ_{j}^{R}$ (for j = 1,. . ., l) are added to the model, as are the p initialization parameters $T_{k}^{+}$ (k = 1,. . ., p) and the lp infectiousness curves I_j,k (for j = 1,. . ., l and k = 1,. . ., p) that capture the time dependencies of transmission risks of pathogen $P_{k}$ via layer j risk acts. The model is thus augmented to support the risk process via the parameters below.

4 Modeling network dynamism

In the next section, we extend the model to include additional parameters that specify the mechanisms governing network evolution over time, capturing the fact that:

An individual's risk partnerships may change if and when they decide to abandon an existing risk partner (or when the risk partner decides to abandon them). Loss of risk partners may cause individual social instability, inducing the individual to seek new risk partners. We refer to the losing and gaining risk partners as the churn process, it is the subject of Section 4.1.
The population may change because an individual enters or leaves the risk network. We refer to this as the population process, it is the subject of Section 4.2.
As individuals age over time, this may alter their risk partner preferences. We refer to this as the aging process, it is the subject of Section 4.3.

Each of the three processes are described below. While the narrative is written from the vantage point of a single layer j of the risk network the processes described are replicated and operate concurrently at each of the j = 1,. . ., l layers.

4.1 The churn process

While the set $N_{j}^{t} (v)$ represents the potential layer j risk partners for v at time t, it is possible for individuals to abandon (or be abandoned by) their risk partners over time. Social instability due to a loss of layer j risk partners may induce individuals to seek new risk partners to compensate for loss of social context. The central premise of our model concerning partner “churn” is the idea that each individual v has an ideal degree d_j(v), which is the ideal size of v's ego network in layer j, based on v's stable personality. This is reflected in the fact that the ideal degree at layer j is selected using the degree distribution $X_{j}$ in procedure MakePopulation (lines 8-10 of Listing 2). While d_j is permitted to vary over the population, here it is assumed to be fixed over time. ⁵

On the other hand, the actual membership (and cardinality) of v's layer j risk partners $N_{j}^{t} (v)$ is permitted to vary with time, albeit in a controlled fashion to be described in what follows.

Each individual v has the propensity to change the membership of $N_{j}^{t} (v)$ , in an act we refer to as “churn”. At creation time, each node v is assigned propensity for churn $t_{j}^{C} (v)$ . In the current model, this number is assumed to be time-invariant for each individual, and is randomly chosen from the positive reals using a truncated Gaussian [Robert, 1995] with (time invariant) mean $μ_{j}^{C}$ and

Param	Description	Units/Range
$μ_{j}^{R}$	Mean time between inter layer j risk impulses	Months
$σ_{j}^{R}$	Inter layer j risk impulse std. dev.	Months
$T_{k}^{+}$	Age interval for initial $P_{k}$ infections.	Months
$I_{j, k}^{+}$	Infectiousness curve for $P_{k}$ via layer j.	Fcn. of age
γ_k,k′	Multiplier for susceptibility to pathogen k given prior infection by k′ ∈ {1,...,p}; k′ ≠ k.	Scalar

Param	Description	Units/Range
$μ_{j}^{C}$	Layer j churn interval mean	Months
$σ_{j}^{C}$	Layer j churn interval std. dev.	Months
$w_{j}^{S}$	Layer j degree stability bias	Positive real

Param	Description	Units/Range
r_p	Population growth rate every 10 years	Percentage (real)

f_tr	Fraction that are “transient”	Between 0 and 1
μ_tr	Mean duration of transients' lifetimes	Months
σ_tr	Std. dev. of transients' lifetimes	Months
μ_st	Mean duration of steadies' lifetimes	Months
σ_st	Std. dev. of steadies' lifetimes	Months

Attribute	θ	p-value
Transitive closure	3.592	^***
Gender homophily (all)	0.058	0.566
Race/Ethnic homophily (all)	1.205	^***
Age homophily (all)	0.367	^**
Number of injection partners	0.460	^***

Number of layers	Pathogen 1 prevalence at 60 months
l = 2 (and p = 1)	70%
l = 3 (and p = 1)	81%
l = 4 (and p = 1)	89%

Base Scenario (p = 1 and l = 1)	42%

Number of Pathogens	Average prevalence at 60 months (across all p pathogens)	Ave. pairwise correlation of pathogen occurrence
p = 2 (and l = 1)	42%	0.86
p = 3 (and l = 1)	43%	0.76
p = 4 (and l = 1)	44%	0.72

Base Scenario (p = 1 and l = 1)	42%

Number of Pathogens p Number of Layers l	Average prevalence at 60 months (across all p pathogens)	Ave. pairwise correlation of pathogen occurrence
p = l = 2	42%	0.12
p = l = 3	43%	−0.11
p = l = 4	42%	0.08

Base Scenario (p = l = 1)	42%

	Input: statistical network model ${(α_{i}, β_{i, j}, X_{j}, \overset{=}{χ_{j}}, P_{k})}_{i = 1, \dots, m; j = 1, \dots, l; k = 1, \dots, p}$ ; population size n.
	Output: risk network (x_i, V, E_j, A_k, d_j)_{i=1,...,m;j=1,...l;k=1,...,p}.
1	({x_i}, {d_j}, V) ← MakePopulation ${(n, {α_{i}}, {X_{j}})}_{i = 1, . ., m; j = 1 . . l}$
2	{A_k} ← MakePathogens ${(V, {P_{k}})}_{k = 1 . . p}$
3	E ← MakeRelations ${({β_{i, j}}, {\overset{=}{χ_{j}}}, {x_{i}}, {d_{j}}, V)}_{i = 1,,, m; j = 1 . . l}$
4	return (x_i, V, E_j, A_k, d_j)_{i=1,...,m;j=1,...,l;k=1,...,p}.

	Input: pop. size n, attribute distributions {α_i}_i=1..m, degree distributions ${X_{j}}_{j = 1 . . l}$ .
	Output: ({x_i}, {d_j}, V)_i=1..m;j=1..l.
1	V = {v₁, v₂, . . ., v_n}.
2	foreach v_k in V do
3	// Set the attributes of individual v_k.
4	foreach i in 1...m do
5	x_i(v_k) := an element of U_i randomly selected via α_i.
6	// Set individual v_k's ideal ego net size at each layer.
7	foreach j in 1...l do
8	$s ≔ f_{j}^{C} (v)$ , the layer-j ideal degree class of v_k.
9	τ := χ_j;s the corresponding layer-j ideal degree distribution, taken from $R a n g e (X_{j})$ .
10	d_j(v_k) := an integer randomly chosen via pdf τ.
11	return ({x_i}, {d_j}, V)_{i=1..m;j=1..i}.

	Input: population V, pathogen prevalences ${P_{k}}_{k = 1 . . p}$
	Output: {A_k}_k=1..p
1	A₁ = A₂ = ... A_p = $\emptyset$
2	foreach k in 1... p do
3	foreach v_i in V do
4	$t ≔ f_{k}^{B} (v)$ , identifying the pathogen-k class of v_i in Domain $(P_{k})$ .
5	τ := p_k;t the corresponding pathogen-k prevalence rate in Range $(P_{k})$ .
6	if Random(0, 1) < τ then
7	A_k := A_k ∪ {v_i}
8	return {A_k}_k=1..p

	Input: bivariate attribute distributions {β_i,j}_i=1..m;j=1..l, bivariate degree distributions ${\overset{=}{χ_{j}}}_{j = 1,, l}$ , individual attributes {x_i}_i=1..m and ideal degrees {d_j}_j=1..l, the population V
	Output: E
1	$E = \emptyset$
2	foreach i in 1... \|V\| do
3	foreach j =1...l do
4	$N_{j} (v_{i}) ≔ \emptyset$ .
5	foreach e = 1...d_j(v) do
6	Schedule AddEdge(v_i, j) to take place at time $\frac{1}{e i + 1}$ .
7	Wait until time 1.
8	$E ≔ ⋃_{i = 1}^{∣ V ∣} ⋃_{w \in N (v_{i})} {(v, w)}$
9	return E

	Input: individual v, layer j.
1	// Determine candidate new neighbors for v.
2	$C_{j}^{t} (v) ≔ V_{j}^{t} \ (N_{j}^{t} (v) \cup {v})$ .
3	if $∣ C_{j}^{t} (v) ∣ > 0$ then
4	foreach c in $C_{j}^{t} (v)$ do
5	Compute the bias due to degree constraints:
	$a_{δ}^{t} (c) ≔ {\begin{matrix} e^{- 1 ∕ (d_{j} (v) - ∣ N_{j}^{t} (v) ∣)} & ∣ N_{j}^{t} (u) ∣ < d_{j} (u) \\ 0 & otherwise . \end{matrix}$
6	Compute the bias due to the bivariate degree distribution:
	$a_{χ}^{t} (v, c) ≔ \overset{=}{χ_{j}} (∣ N_{j}^{t} (v) ∣ - ∊, ∣ N_{j}^{t} (v) ∣ + ∊, ∣ N_{j}^{t} (c) ∣ - ∊, ∣ N_{j}^{t} (c) ∣ + ∊)$ .
7	Compute the bias due to bivariate attribute distributions:
	$a_{β}^{t} (c) ≔ \prod_{i = 1}^{m} β_{i, j} (x_{i} (v), x_{i} (c))$ .
8	Compute the bias due to triadic closures:
	$a_{Δ}^{t} (c) ≔ ζ (w_{j}^{Δ} - 1) \cdot Δ_{j}^{t} (v, c)$
	where $Δ_{j}^{t} (v, c) = #$ layer j triangles formed on adding layer j edge (v, c).
9	Compute propensity of edge (v, c) as the product of 4 biases:
	$w_{j}^{t} (c) ≔ a_{δ}^{t} (c) \cdot a_{χ}^{t} (c) \cdot a_{α}^{t} (c) \cdot a_{Δ}^{t} (c)$ .
10	Normalize propensity to obtain a distribution over $C_{j}^{t} (v) :$ :
	$p_{j}^{t} (c) ≔ \frac{ω_{j}^{t} (c)}{\sum_{c^{'} \in C_{j}^{t} (v)} ω_{j}^{t} (c^{'})}$ .
11	w := choose from $C_{j}^{t} (v)$ randomly according to distribution $p_{j}^{t}$ .
12	// Add the layer j edge connecting v to w.
13	$N_{j}^{t} (v) ≔ N_{j}^{t} (v) \cup {(v, w)}$

Name	Possible values (U_i)
x₁ : Gender	{Male, Female}
x₂ : Ethnicity	{White, Hispanic, African-American, Other}
x₃ : AgeBinned	{[15-20), [20-25), [25-30), [30-35), [35-40), [40-45), [45-50), [50-55)}
x₄ : DegreeBinned	{[0-2), [2-4), [4-10), [10-20)}

β₂	White	Hispanic	African-American	Other
White	115/178	18/178	43/178	7/178
Black	11/157	112/157	31/157	7/157
Hispanic	32/175	25/175	117/175	1/175
Other	2/7	4/7	1/7	0

β₃	[15-20)	[20-25)	[25-30)	[30-35)	[35-40)	[40-45)	[45-50)	[50-55)
[15-20)	1/4	1/4	2/4	0	0	0	0	0
[20-25)	1/26	1/26	7/26	6/26	7/26	3/26	1/26	0
[25-30)	1/147	8/147	41/147	45/147	38/147	12/147	2/147	0
[30-35)	0	5/153	33/153	50/153	43/153	16/153	5/153	1/153
[35-40)	0	4/161	20/161	41/161	56/161	29/161	6/161	5/161
[40-45)	0	2/137	14/137	27/137	44/137	40/137	3/137	7/137
[45-50)	0	0	2/20	6/20	6/20	4/20	2/20	0
[50-55)	0	0	0	2/14	6/14	4/14	0	2/14

$\overset{=}{χ}$	[0-2)	[2-4)	[4-10)	[10-20)
[0-2)	77/158	37/158	29/158	15/158
[2-4)	57/203	77/203	40/203	29/203
[4-10)	42/195	50/195	64/195	39/195
[10-20)	15/100	23/100	31/100	31/100

PERMALINK

A stochastic agent-based model of pathogen propagation in dynamic multi-relational social networks

Bilal Khan

Kirk Dombrowski

Mohamed Saad

Abstract

1 Introduction

2 Modeling network structure

2.1 Obtaining data on real-world risk networks

2.2 Defining a statistical network model

2.3 Generating networks from a statistical network model

2.4 Validating generated networks

3 Modeling pathogens: the risk process

4 Modeling network dynamism

4.1 The churn process

4.2 The population process

Macroscopic population controls

Microscopic population controls

4.3 The aging process

5 A case study: HIV in IDU networks

5.1 Simulation experiments

5.1.1 Validating the Simulations against “Ground Truth”

5.1.2 Continuous Model Validation

Figure 1. HIV prevalence in SFHR-based networks of size 1000, 5000, 10,000 and 25,000.

Figure 2. New HIV infections over time in networks of size 1000, 5000, 10,000 and 25,000.

Figure 3.

5.1.3 Experiments with Derived artificial Multi-Layer Multi-Pathogen Scenarios

Artificial Scenario 1.

Artificial Scenario 2.

Artificial Scenario 3.

Artificial Scenario 4.

6 Conclusions

Figure 4.

Algorithm 1.

Algorithm 2.

Algorithm 3.

Algorithm 4.

Algorithm 5.

Acknowledgements

7 Appendix: Model Parameterizations

7.1 Static Network

7.2 Pathogen model for HIV/IDU

7.3 Network dynamism model for HIV/IDU

Footnotes

Bibliography

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases