Abstract
Partnership concurrency is a major driver of permeability of social networks to diffusion, and an important modeling target in the context of sexually transmitted infections. A seemingly unrelated phenomenon of concern in modeling social networks is isolation avoidance—the tendency of individuals to maintain at least one tie. Although concurrency bias and bias in isolate formation would naively seem to be distinct, we here show that their respective ERGM expressions (edge/concurrent tie and edge/isolate families, and their regular extensions) are equivalent, and that both are equivalent to a special case of the geometrically weighted degree families. In addition to being statistically useful, this equivalence provides insight into the essential connection between these apparently different structural phenomena.
Keywords: ERGMs, model parameterization, social isolation, concurrency, Markov graphs, k-stars, geometrically weighted degree
Exponential family random graph models (ERGMs) provide a powerful framework for describing, simulating, and measuring structural biases within social networks associated with social processes (Wasserman and Robins, 2005; Robins and Morris, 2007). Given an order-N random graph, Y, on support 𝓎N, the probability mass function (pmf) of Y may be written in ERGM form as
(1) |
where X is a covariate set, t : 𝓎n → ℝp is a vector of sufficient statistics, θ ∈ ℝp is a vector parameters, and 𝕀𝓎n is the counting measure on the support. By making different choices of t, a wide range of properties can be modeled, including both heterogeneity and conditional dependence among edge variables (see, e.g. Frank and Strauss, 1986; Wasserman and Pattison, 1996; Robins and Pattison, 2005; Snijders et al., 2006). By turns, studying the properties of a hypothesized model can give one insight into the behavior of the process giving rise to it, including its dependence structure, stability, and extrapolative behavior (Snijders, 2010; Butts, 2011; Schweinberger, 2011).
In this paper, we show that two families of models for seemingly distinct structural biases—concurrency and isolate formation—are in fact formally equivalent. In addition to being statistically useful, this equivalence provides insight into how these two types of structural phenomena are related.
In the development that follows, we will focus on models for simple graphs of fixed order N (i.e., taking 𝓎N equal to the set of all undirected loopless graphs on N vertices). This is the case for which concurrency per se is most meaningful, and most typically studied. Generalization of these results to the directed case is possible, though we do not pursue this here.
1 Two Model Families
We begin by introducing the two model families of interest, showing their equivalence in Section 2.
1.1 Edge/Concurrent Tie Family
Although concurrency can be modeled in various ways, one of the simplest and most natural (implemented e.g. as the concurrenties statistic in statnet (Handcock et al., 2008; Hunter et al., 2008)) is to include a statistic counting the number of edgewise concurrencies; i.e., the sum over all vertices of the number of edges incident upon each vertex after the first. When associated with a negative parameter, this statistic expresses the notion that the addition of a tie to an existing graph (respectively, removal) is penalized (respectively favored) for each prospective endpoint that already has at least one other edge. Formally, this statistic is defined as
Let be the standard edge count statistic. We may then define the family of (regularly extended) edge/concurrent tie models by setting θC = (θe, θc, θr) and tC = (te, tc, tr), where tr is any set of additional statistics, and taking Pr(Y = y|θC, X) as per Equation 1. θc then represents the bias towards or away from the formation of concurrent ties, with θc < 0 representing the case of concurrency penalties typically seen in sexual contact networks (Morris and Kretzschmar, 1995).
1.2 Edge/Isolate Family
A bias towards or away from isolate formation is commonly captured by inclusion of a statistic, tI, counting the number of isolates in the graph. Setting θI = (θe, θI, θr) and tI = (te, tI, tr) (with te and tr respectively representing the edge count and any additional statistics) and taking Pr(Y = y|θI, X) as per Equation 1 then defines the family of regularly extended edge/isolate models. Intuitively, θI < 0 implies the case of isolation avoidance (studied e.g., by Butts (2015)), while θI > 0 implies a higher incidence of isolates than would be expected given the other model terms.
2 Model Equivalence
Our main result is the following:
Theorem 1 (Model Equivalence)
Let fC(y|θC) be an edge/concurrent tie ERGM pmf and fI (y|θI) an edge/isolate ERGM pmf on common support 𝓎N. For all θC, θI there exist θC′,θI′ such that fC(y|θC) = fI (y|θC′) and fI (y|θI) = fC(y|θI ′) for all y ∈ 𝓎N.
Proof
Although seemingly distinct, we can prove the model families implied by tC and tI to be equivalent by showing that we can write tC as an affine transformation of tI and vice versa (implying that, for any θ on tC, there exists a θ′ such that θtC(y) = θ′tI (y) + k for all y in the support). Considering that te and tr appear in both vectors, the only non-trivial mapping is from concurrent ties to edges and isolates (and, respectively, from isolates, to edges and concurrent ties). To that end, we observe the following:
This follows from the fact that the number of concurrent edges on any given vertex of degree greater than 0 is equal to the degree of that vertex minus 1; subtracting 1 from the degree of each vertex yields a “deficit” equal to −1 per isolate, which can be recovered by adding back the isolate count. We then have
Since N is an additive constant over the support, it has no effect on the associated ERGM pmf and may be dropped, giving us
where the ≐ relation signifies equality up to an additive constant.1 It follows immediately that
(again, we may ignore the additive constant), and hence tC = (te, tc, tr) and tI = (te, tI, tr) are related (up to a constant) via the invertible affine mapping
(with the continuation being the identity matrix), the inverse being
It follows that the family of ERGMs on the order-N simple graphs with statistics tC is equivalent to the ERGM family on the same support with statistics tI, and hence for any θC there must exist a θC′ such that fC(y|θC) = fI (y|θC′) for all y ∈ 𝓎N (and, likewise, for any θI there must exist a θI′ such that fI (y|θI) = fC(y|θI′)).
Via the above, we can easily determine the mapping that takes the parameter vector from the edge/concurrent tie family to the edge/isolate family (and vice versa). Equating the transformed statistics gives us:
As usual, we drop constant terms in the above, giving us the expression . To obtain the edge/concurrent tie parameters from the edge/isolate parameters, we simply solve for θC:
noting as above the omission of a constant term. The resulting expression for θC is .
3 k-Star Representation
Since both the edge/isolate and edge/concurrent tie models can be expressed in terms of an isolate count statistic, both belong to the family of Markov graphs (Frank and Strauss, 1986) and can be given a common representation in terms of k-star statistics. This representation is especially simple for these model families, and we derive it here. For simplicity, we omit any additional terms and parameters (tr, θr); these can be “added on” to the k-star form after the fact without loss of generality.
Let ζ = (ζ0,…,ζN−1) be the vector of degree parameters, and ψ = (ψ1,…,ψN−1) the corresponding vector of k-star parameters (such that ζi is the parameter for the count of nodes having degree i, and ψi is the parameter for the count of i-stars). For a pure isolate model, with no other degree terms, we have trivially ζ0 = θI and ζi = 0 for i > 0. To find the corresponding ψ vector, we first note that the linear dependence among the degree statistics implies that the N degree parameters have only N − 1 degrees of freedom; to establish a mapping from ζ to ψ we thus begin by mapping ζ to an “isolate free” parameterization, , with no isolate effect (equivalently, by construction). This is accomplished by subtracting the isolate parameter from each other degree parameter (here 0), i.e., . Given ζ′, we may obtain ψ by the relation
(2) |
itself the inverse of the well-known mapping from the k-star parameters to the (non-isolate) degree parameters given by Frank and Strauss (1986). For the special case of , Eq. 2 reduces to the trivial form
(3) |
Observe that we have held out the edge term; since edges are synonymous with 1-stars, we may simply add the edge parameter to ψ1 to obtain the final parameter vector. Substituting the isolate form of each model into Eq. 3 and adding the edge parameter then gives their k-star representations: and . In both families, the dependence parameter affects the successive star parameters as a constant of alternating sign.
4 Discussion
The most immediate implication of Theorem 1 is that the apparently distinct phenomena of isolation avoidance and suppression of concurrent edges have equivalent effects on graph structure; the two effects cannot be separated on the basis of cross-sectional data, and indeed isolate count and concurrent tie statistics cannot be jointly identified in an ERGM with a free edge parameter. Pragmatically, this implies that whichever representation is more mathematically and/or computationally convenient can be employed when using these models. Less sanguinely, this also underscores a basic limitation on the possibility of distinguishing certain processes on the basis of single graph realizations.
The Markov graph expression of the isolation and concurrent tie families is interesting for its archetypal simplicity: the presence of a constant magnitude star parameter with alternating sign is an instantly recognizable signature of these families, and may provide a clue to identifying them e.g. from crude initial estimates of k-star parameters (obtained, e.g., via maximum pseudo-likelihood estimation). Interestingly, this pattern also provides insight into a third seemingly distinct family, the models formed via geometrically weighted degree (GWD) terms (Hunter, 2007). GWD applies a weighted parameter to the ith degree statistic (for i ≥ 1) given by θGWD exp(θs)(1− (1−exp(−θs))i), where θGWD is a regular parameter and θs is a curved parameter (often called a “decay parameter”) governing the shape of the degree weighting function. Although it is often assumed that θs is positive, how does GWD behave when θs = 0? Clearly, the GWD coefficients in this case are uniformly equal to θGWD, with corresponding k-star parameters (θGWD, −θGWD,θGWD,…). From Eq. 3, this is immediately recognizable as an isolate model (and hence equivalently a concurrent edge model) with parameter θI = −θGWD. The equivalence of the GWD model with θs = 0 and the isolate model seems not to be widely known (though a very similar result is derived by Hunter and Handcock (2006) for an extended family containing the GWD models), and the relationship of both to the the concurrent tie model seems not previously to have been noted. This provides yet another way to interpret all three model families,2 and implies the practical cautionary that GWD families will become poorly identified as θs → 0 when isolate or concurrent edge terms are included in the model.
Beyond their formal utility, the present results also provide substantive intuition for the link between isolation avoidance and concurrency penalties.3 One schematic view of these processes is illustrated in Figure 1. We consider the network in terms of three classes of nodes–isolates, pendants, and nodes with concurrent ties (“concurrent nodes”)–at constant density. Under conditions of isolation avoidance, isolates exert an effective “pull” on edges; since these edges cannot come from pendants (each of which exerts an equally strong hold on its sole tie as the isolates’ attractive force), they must be drawn from those incident upon concurrent nodes (figure, arrow A). This transfer of edges converts both isolates (by edge addition) and concurrent nodes (by edge removal) into pendents (figure, arrows B and C). Now consider the case of concurrency penalty. Here, concurrent ties are “repelled” from high degree nodes; since these edges cannot be redirected to pendants (who resist becoming concurrent with equal strength), they must necessarily be allocated to the set of isolates (arrow A). This once more converts both isolates and concurrent nodes into pendants (arrows B and C). Although the force of edge reallocation is conceived of as a “pull” in the case of isolation avoidance and a “push” in the case of concurrency penalty, it can be seen as a common underlying driver of network structure. Interestingly, the link between increasing concurrency and enhancing the number of isolates was noted in an early simulation study by Morris and Kretzschmar (1995) using a dynamic network model; we have here demonstrated how this connection necessarily arises via the transfer of nodes into/out of the pendant class.
Figure 1.
Schematic illustrating structural mechanisms in the edge/isolate and edge/concurrent tie models. Under conditions of isolation/concurrency penalty at constant density, edges are drawn from vertices of degree > 1 and transferred to isolates (A). This results in a net transfer of isolates to the class of pendants (B) due to edge gain, as well as a transfer of concurrent vertices to the class of pendants (C) due to edge loss. “Repulsion” of edges from concurrent nodes (concurrency penalty) and “attraction” of edges to isolates lead to the same equilibrium state.
5 Conclusion
The edge/isolate and edge/concurrent tie families are motivated by very different social mechanisms, but can be seen to be formally equivalent. Both are also equivalent to a special case of the geometrically weighted degree models in which the weight decay parameter, θs, is equal to zero. Intuitively, these families can be seen as implementing an edge reallocation force that (in the negative parameter case) shifts edges from concurrent nodes to isolates, converting members of both groups to pendants. In the positive parameter case this force works in reverse, enhancing the counts of isolates and concurrent nodes relative to pendants. Although the isolation and concurrent tie parameters do have different effects on density, these cannot be discerned in the typical case of a free edge parameter; as such, it is not generally possible to distinguish isolation and concurrent tie effects (nor GWD effects with θs ≈ 0) on the basis of cross-sectional network data.
This surprising equivalence among model classes raises the question of what other equivalances exist among apparently distinct classes of structural biases. Such equivalances reveal hidden commonalities among the seeming diversity of potential social mechanisms, and their identification would seem to be an important target for further research.
Footnotes
This work is based on research supported by NSF award #IIS-1251267 and NIH award #1R01HD068395-01.
I.e., f(x) ≐ g(x) ⇔ f(x) = g(x) + k, where k is some quantity that is constant with respect to x.
Since the alternating k-stars of Snijders et al. (2006) are equivalent to the GWD terms up to an edge parameter (Hunter, 2007), the current results apply to models based on the former terms as well.
A similar process operates in reverse, mutatis mutandis, in the case of isolation and/or concurrency enhancement.
References
- Butts CT. Bernoulli graph bounds for general random graphs. Sociological Methodology. 2011;41:299–345. [Google Scholar]
- Butts CT. A novel simulation method for binary discrete exponential families, with application to social networks. Journal of Mathematical Sociology. 2015 doi: 10.1080/0022250X.2015.1022279. forthcoming. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank O, Strauss D. Markov graphs. Journal of the American Statistical Association. 1986;81:832–842. [Google Scholar]
- Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M. statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software. 2008;24(1):1–11. doi: 10.18637/jss.v024.i01. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunter DR. Curved exponential family models for social networks. Social Networks. 2007;29:216–230. doi: 10.1016/j.socnet.2006.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunter DR, Handcock MS. Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics. 2006;15:565–583. [Google Scholar]
- Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software. 2008;24(3) doi: 10.18637/jss.v024.i03. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris M, Kretzschmar M. Concurrent partnerships and transmission dynamics in networks. Social Networks. 1995;17(3):299–318. [Google Scholar]
- Robins GL, Morris M. Advances in exponential random graph (p*) models. Social Networks. 2007;29:169–172. doi: 10.1016/j.socnet.2006.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robins GL, Pattison PE. Interdependencies and social processes: Dependence graphs and generalized dependence structures. In: Carrington PJ, Scott J, Wasserman S, editors. Models and Methods in Social Network Analysis. chapter 10. Cambridge University Press; Cambridge: 2005. pp. 192–214. [Google Scholar]
- Schweinberger M. Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association. 2011 doi: 10.1198/jasa.2011.tm10747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snijders TAB. Conditional marginalization for exponential random graph models. Journal of Mathematical Sociology. 2010;34:239–252. [Google Scholar]
- Snijders TAB, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociological Methodology. 2006;36 [Google Scholar]
- Wasserman S, Pattison PE. Logit models and logistic regressions for social networks: I. an introduction to Markov graphs and p*. Psychometrika. 1996;60:401–426. [Google Scholar]
- Wasserman S, Robins GL. An introduction to random graphs, dependence graphs, and p*. In: Carrington PJ, Scott J, Wasserman S, editors. Models and Methods in Social Network Analysis. chapter 10. Cambridge University Press; Cambridge: 2005. pp. 192–214. [Google Scholar]