Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Mar 25.
Published in final edited form as: Electron J Stat. 2012;6:1100–1128. doi: 10.1214/12-EJS696

Exponential-family random graph models for valued networks

Pavel N Krivitsky 1
PMCID: PMC3964598  NIHMSID: NIHMS469413  PMID: 24678374

Abstract

Exponential-family random graph models (ERGMs) provide a principled and flexible way to model and simulate features common in social networks, such as propensities for homophily, mutuality, and friend-of-a-friend triad closure, through choice of model terms (sufficient statistics). However, those ERGMs modeling the more complex features have, to date, been limited to binary data: presence or absence of ties. Thus, analysis of valued networks, such as those where counts, measurements, or ranks are observed, has necessitated dichotomizing them, losing information and introducing biases.

In this work, we generalize ERGMs to valued networks. Focusing on modeling counts, we formulate an ERGM for networks whose ties are counts and discuss issues that arise when moving beyond the binary case. We introduce model terms that generalize and model common social network features for such data and apply these methods to a network dataset whose values are counts of interactions.

Keywords and phrases: p-star model, transitivity, weighted network, count data, maximum likelihood estimation, Conway–Maxwell–Poisson distribution

1. Introduction

Networks are used to represent and analyze phenomena ranging from sexual partnerships (Morris and Kretzschmar, 1997), to advice giving in an office (Lazega and Pattison, 1999), to friendship relations (Goodreau, Kitts and Morris, 2008; Newcomb, 1961), to international relations (Ward and Hoff, 2007), to scientific collaboration, and many other domains (Goldenberg et al., 2009). More often than not, the relations of interest are not strictly dichotomous in the sense that all present relations are effectively equal to each other. For example, in sexual partnership networks, some ties are short-term while others are long-term or marital; friendships and acquaintance have degrees of strength, as do international relations; and while a particular individual seeking advice might seek it from some coworkers but not others, he or she will likely do it in some specific order and weight advice of some more than others.

Network data with valued relations come in many forms. Observing messages (Freeman and Freeman, 1980; Diesner and Carley, 2005), instances of personal interaction (Bernard, Killworth and Sailer, 1979–1980), or counting co-occurrences or common features of social actors (Zachary, 1977; Batagelj and Mrvar, 2006) produce relations in the form of counts. Measurements, such as duration of interaction (Wyatt, Choudhury and Bilmes, 2009) or volume of trade (Westveld and Hoff, 2011) produce relations in the form of (effectively) continuous values. Observations of states of alliance and war (Read, 1954) produce signed relationships. Sociometric surveys often produce ranks in addition to binary measures of affection (Sampson, 1968; Newcomb, 1961; Bernard, Killworth and Sailer, 1979–1980; Harris et al., 2003).

Exponential-family random graph models (ERGMs) are generative models for networks which postulate an exponential family over the space of networks of interest (Holland and Leinhardt, 1981; Frank and Strauss, 1986), specified by their sufficient statistics (Morris, Handcock and Hunter, 2008), or, as with Frank and Strauss (1986), by their conditional independence structure leading to sufficient statistics (Besag, 1974). These sufficient statistics typically embody the features of the network of interest that are believed to be significant to the social process which had produced it, such as degree distribution (e.g., propensity towards monogamy in sexual partnership networks), homophily (i.e., “birds of a feather flock together”), and triad-closure bias (i.e., “a friend of a friend is a friend”). (Morris, Handcock and Hunter, 2008)

A major limitation of ERGMs to date has been that they have been applied almost exclusively to binary relations: a relationship between a given actor i and a given actor j is either present or absent. This is a serious limitation: valued network data have to be dichotomized for ERGM analysis, an approach which loses information and may introduce biases. (Thomas and Blitzstein, 2011)

Some extensions of ERGMs to specific forms of valued ties have been formulated: to networks with polytomous tie values, represented as a constrained three-way binary array by Robins, Pattison and Wasserman (1999) and more directly by Wyatt, Choudhury and Bilmes (2009; 2010); to multiple binary networks by Pattison and Wasserman (1999); and the authors are also aware of some preliminary work by Handcock (2006) on ERGMs for signed network data. Rinaldo, Fienberg and Zhou (2009) discussed binary ERGMs as a special case and a motivating application of their developments in geometry of discrete exponential families.

A broad exception to this limitation has been a subfamily of ERGMs that have the property that the ties and their values are stochastically independent given the model parameters. Unlike the dependent case, the likelihoods for these models can often be expressed as generalized linear or nonlinear models, and they tend to have tractable normalizing constants, which allows them to more easily be embedded in a hierarchical framework. Thus, to represent common properties of social networks, such as actor heterogeneity, triad-closure bias, and clustering, latent class and position models have been used and extended to valued networks. (Hoff, 2005; Krivitsky et al., 2009; Mariadassou, Robin and Vacher, 2010)

In this work, we generalize the ERGM framework to directly model valued networks, particularly networks with count dyad values, while retaining much of the flexibility and interpretability of binary ERGMs, including the above-described property in the case when tie values are independent under the model. In Section 2, we review conventional ERGMs and describe their traits that valued ERGMs should inherit. In Section 3, we describe the framework that extends the model class to networks with counts as dyad values and discuss additional considerations that emerge when each dyad’s sample space is no longer binary. In Section 4, we give some details and caveats of our implementation of these models and briefly address the issue of ERGM degeneracy as it pertains to count data. Applying ERGMs requires one to specify and interpret sufficient statistics that embody network features of interest, all the while avoiding undesirable phenomena such as ERGM degeneracy. Thus, in Section 5, we introduce and discuss statistics to represent a variety of features commonly found in social networks, as well as features specific to networks of counts. In Section 6 we use these statistics to model social forces that affect the structure of a network of counts of conversations among members of a fraternity. Finally, in Section 7, we discuss generalizing ERGMs to other types of valued data.

2. ERGMs for binary data

In this section, we define notation, review the (potentially curved) exponential-family random graph model and identify those of its properties that we wish to retain when generalizing.

2.1. Notation and binary ERGM definition

Let N be the set of actors in the network of interest, assumed known and fixed for the purposes of this paper, and let n ≡ |N| be its cardinality, or the number of actors in the network. For the purposes of this paper, let a dyad be defined as a (usually distinct) pair of actors, ordered if the network of interest is directed, unordered if not, between whom a relation of interest may exist, and let 𝕐 be the set of all dyads. More concretely, if the network of interest is directed, 𝕐 ⊆ N × N, and if it is not, 𝕐 ⊆ {{i, j} : (i, j) ∈ N × N}. In many problems, a relation of interest cannot exist between an actor and itself (e.g., a friendship network), or actors are partitioned into classes with relations only existing between classes (e.g., bipartite networks of actors attending events), in which case 𝕐 is a proper subset of N × N, excluding those pairs (i, j) between which there can be no relation of interest.

Further, let the set of possible networks of interest (the sample space of the model) 𝒴 ⊆ 2𝕐, the power set of the dyads in the network. Then a network y ∈ 𝒴, can be considered a set of ties (i, j). Again, in some problems, there may be additional constraints on 𝒴. A common example of such constraints are degree constraints induced by the survey format (Harris et al., 2003; Goodreau, Kitts and Morris, 2008).

Using notation similar to that of Hunter and Handcock (2006) and Krivitsky, Handcock and Morris (2011), an exponential-family random graph model has the form

Prθ;η,g(Y=y|x)=exp(η(θ)·g(y;x))κη,g(θ;x),y𝒴, (1)

for random network variable Y and its realization y; model parameter vector θΘ (for parameter space Θ ⊆ ℝq) and its mapping to canonical parameters η : Θ → ℝp; a vector of sufficient statistics g : 𝒴 → ℝp, which may also depend on data x, assumed fixed and known; and a normalizing constant (in y) κη,g : ℝq → ℝ which ensures that (1) sum to 1 and thus has the value

κη,g(θ;x)=y𝒴exp(η(θ)·g(y;x)).

Here, we have given the most general case defined by Hunter and Handcock (2006). Usually, q = p and η(θ) ≡ θ, so the exponential family is linear. For notational simplicity, we will omit×for the remainder of this paper, as g incorporates it implicitly.

2.2. Properties of binary ERGM

2.2.1. Conditional distributions and change statistics

Snijders et al. (2006), Hunter et al. (2008), Krivitsky, Handcock and Morris (2011), and others define change statistics or change scores, which emerge when considering the probability of a single dyad having a tie given the rest of the network and provide a convenient local interpretation of ERGMs. To summarize, define the p-vector of change statistics

Δi,jg(y)g(y+(i,j))g(y(i,j)),

where y + (i, j) is the network y with edge or arc (i, j) added if absent (and unchanged if present) and y − (i, j) is the network y with edge or arc (i, j) removed if present (and unchanged if absent). Then, through cancellations,

Prθ;η,g(Yi,j=1|Y(i,j)=y(i,j))=logit1(η(θ)·Δi,jg(y)).

It is often the case that the form of Δi,jg(y) is simpler than that of g(y) both algebraically and computationally. For example, the change statistic for edge count |y| is simply 1, indicating that a unit increase in η|y|(θ) will increase the conditional log-odds of a tie by 1, while the change statistic for the number of triangles in a network is |yiyj|, the number of neighbors i and j have in common, suggesting that a positive coefficient on this statistic will increase the odds of a tie between i and j exponentially in the number of common neighbors. Hunter et al. (2008) and Krivitsky, Handcock and Morris (2011) offer a further discussion of change statistics and their uses, and Snijders et al. (2006) and Schweinberger (2011) use them to diagnose degeneracy in ERGMs. It would be desirable for a generalization of ERGM to valued networks to facilitate similar local interpretations.

Furthermore, the conditional distribution serves as the basis for maximum pseudo-likelihood estimation (MPLE) for these models. (Strauss and Ikeda, 1990)

2.2.2. Relationship to logistic regression

If the model has the property of dyadic independence discussed in the Introduction, or, equivalently, the change statistic Δi,jg(y) is constant in y (but may vary for different (i, j)), the model trivially reduces to logistic regression. In that case, the MLE and the MPLE are equivalent. (Strauss and Ikeda, 1990) Similarly, it may be a desirable trait for valued generalizations of ERGMs to also reduce to GLM for dyad-independent choices of sufficient statistics.

3. ERGM for counts

We now define ERGMs for count data and discuss the issues that arise in the transition.

3.1. Model definition

Define N, n, and 𝕐 as above. Let ℕ0 be the set of natural numbers and 0. Here, we focus on counts with no a priori upper bound — or counts best modeled thus. Instead of defining the sample space 𝒴 as a subset of a power set, define it as 𝒴0𝕐, a set of mappings that assign to each dyad (i, j) ∈ 𝕐 a count. Let yi,j = y(i, j) ∈ ℕ0 be the value associated with dyad (i, j).

A (potentially curved) ERGM for a random network of counts Y ∈ 𝒴 then has the pmf

Prθ;h,η,g(Y=y)=h(y)exp(η(θ)·g(y))κh,η,g(θ), (2)

where the normalizing constant

κh,η,g(θ)=y𝒴h(y)exp(η(θ)·g(y)),

with η, g, and θ defined as above, and

ΘΘN={θq:κh,η,g(θ)<} (3)

(Barndorff-Nielsen, 1978, pp. 115–116; Brown, 1986, pp. 1–2), with ΘN being the natural parameter space if the ERGM is linear. Notably, while (3) is trivial for binary networks because their sample space is finite, for counts it can be a fairly complex constraint.

For the remainder of this paper, we will focus on linear ERGMs, so unless otherwise noted, p = q and η(θ) ≡ θ.

3.2. Reference measure

In addition to the specification of the sufficient statistics g and, for curved families, mapping η of model parameters to canonical parameters, an ERGM for counts depends on the specification of the function h : 𝒴 → [0, ∞). Formally, along with the sample space, it specifies the reference measure: the distribution relative to which the exponential form is specified. For binary ERGMs, h is usually not specified explicitly, though in some ERGM applications, such as models with offsets (Krivitsky, Handcock and Morris, 2011, for example) and profile likelihood calculations of Hunter et al. (2008), the terms with fixed parameters are implicitly absorbed into h.

For valued network data in general, and for count data in particular, specification of h gains a great deal of importance, setting the baseline shape of the dyad distribution and constraining the parameter space. Consider a very simple p = 1 model with g(y) = (∑ (i,j)∈𝕐 yi,j), the sum of all dyad values. If h(y) = 1 (i.e., discrete uniform), the resulting family has the pmf

Prθ;h,η,g(Y=y)=exp(θ(i,j)𝕐yi,j)κh,η,g(θ)=(i,j)𝕐exp(θyi,j)1exp(θ),

giving the dyadwise distribution Yi,j~i.i.d.Geometric(p=1exp(θ)), with θ < 0 by (3). On the other hand, suppose that, instead, h(y) = ∏(i,j)∈𝕐(yi,j!)−1. Then,

Prθ;h,η,g(Y=y)=exp(θ(i,j)𝕐yi,j)κh,η,g(θ)(i,j)𝕐yi,j!=(i,j)𝕐exp(θyi,j)yi,j!exp(θ),

giving Yi,j~i.i.d.Poisson(μ=exp(θ)), with ΘN = ℝ. The shape of the resulting distributions for a fixed mean is given in Figure 1.

Fig 1.

Fig 1

Effect of h on the shape of the distribution. (The mean is fixed at 2.)

The reference measure h thus determines the support and the basic shape of the ERGM distribution. For this reason, we define a geometric-reference ERGM to have the form (2) with h(y) = 1 and a Poisson-reference ERGM to have h(y) = ∏(i,j)∈𝕐(yi,j!)−1.

Note that this does not mean that any Poisson-reference ERGM will, even under dyadic independence, be dyadwise Poisson. We discuss the sufficient conditions for this in Section 5.2.1.

4. Inference and implementation

As exponential families, valued ERGMs, and ERGMs for counts in particular, inherit the inferential properties of discrete exponential families in general and binary ERGMs in particular, including calculation of standard errors and analysis of deviance. They also inherit the caveats. For example, the Wald test results based on standard errors depend on asymptotics which are questionable for ERGMs with complex dependence structure (Hunter and Handcock, 2006), so, in Section 6 we confirm the most important of the results using a simple Monte Carlo test: we fit a nested model without the statistic of interest and simulate its distribution under such a model. The quantile of the observed value of the statistic of interest can then be used as a more robust P-value.

At the same time, generalizing ERGMs to counts raises additional inferential issues. In particular, the infinite sample space of counts means that the constraint (3) is not always trivially satisfied, which results in some valued ERGM specifications not fulfilling regularity conditions. We give an example of this in Section 5.2.3 and Appendix B. Additional computational issues also arise.

4.1. Computational issues

The greatest practical difficulty associated with likelihood inference on these models is usually that the normalizing constant κh,η,g(θ) is intractable, its exact evaluation requiring integration over the sample space 𝒴. However, the exponential-family nature of model also means that, provided a method exists to simulate realizations of networks from the model of interest given a particular θ, the methods of Geyer and Thompson (1992) for fitting exponential families with intractable normalizing constants and, more specifically, their application to ERGMs by Hunter and Handcock (2006), may be used. These methods rely on network sufficient statistics rather than networks themselves and can thus be used with little modification. More concretely, the ratio of two normalizing constants evaluated at θ′ and θ can be expressed as

κh,η,g(θ)κh,η,g(θ)=y𝒴h(y)exp(η(θ)·g(y))κh,η,g(θ)=y𝒴h(y)exp((η(θ)η(θ))·g(y))exp(η(θ)·g(y))κh,η,g(θ)=y𝒴exp((η(θ)η(θ))·g(y))h(y)exp(η(θ)·g(y))κh,η,g(θ)=Eθ;h,η,g(exp((η(θ)η(θ))·g(Y))),

so given a sample Y(1),…, Y(S) from an initial guess θ, it can be estimated

κh,η,g(θ)κh,η,g(θ)s=1Sexp((η(θ)η(θ))·g(Y(s))).

Another method for fitting ERGMs, taking advantage of the equivalence of the method of moments to the maximum likelihood estimator for linear exponential families, was implemented by Snijders (2002), using the algorithm by Robbins and Monro (1951) for simulated statistics to fit the model. This approach also trivially extends to valued ERGMs.

Furthermore, because the normalizing constant (if it is finite) is thus accommodated by the fitting algorithm, we may focus on the unnormalized density for the purposes of model specification and interpretation. Therefore, for the remainder of this paper, we specify our models up to proportionality, as Geyer (1999) suggests.

That (3) is not trivially satisfied for all θ ∈ ℝq presents an additional computational challenge: even for relatively simple network models, the natural parameter space ΘN may have a nontrivial shape. For example, even a simple geometric-reference ERGM

Prθ;h,η,g(Y=y)(i,j)𝕐exp(θ·(xi,jyi,j)),

a geometric GLM with a covariate p-vector xi,j, has

ΘN={θp:(i,j)𝕐θ·xi,j<0},

an intersection of up to |𝕐| half-spaces (linear constraints). Models with complex dependence structure may have less predictable parameter spaces, and, due to the nature of the algorithm of Hunter and Handcock (2006), the only general way to detect whether a guess for θ had strayed outside of ΘN may be by diagnostics on the simulation. Bayesian inference with improper priors faces a similar problem, and addressing it in the context of ERGMs is a subject for future work. For this paper, we focus on models in which parameter spaces are provably unconstrained or have very simple constraints.

We base our implementation on the 𝖱 package ergm for fitting binary ERGMs. (Handcock et al., 2012) The design of that package separates the specification of model sufficient statistics from the specification of the sample space of networks (Hunter et al., 2008), so we implement our models by substituting in a Metropolis-Hastings sampler that implements our 𝒴 and h of interest. (A simple sampling algorithm for realizations from a Poisson-reference ERGM, optimized for zero-inflated data, is described in Appendix A.) This implementation will be incorporated into a future public release of ergm.

4.2. Model degeneracy

Application of ERGMs has long been associated with a complex of problems collectively referred to as “degeneracy”. (Handcock, 2003; Rinaldo, Fienberg and Zhou, 2009; Schweinberger, 2011) Rinaldo, Fienberg and Zhou, in particular, list three specific, interrelated, phenomena: 1) when a parameter configuration — even the MLE — induces a distribution for which only a small number of possible networks have non-negligible probabilities, and these networks are often very different from each other (e.g., a sparser-than-observed graph and a complete graph) for an effectively bimodal distribution; 2) when the MLE is hard to find by the available MCMC methods; and 3) when the probability of the observed network under the MLE is relatively low — the observed network is, effectively, between the modes. This bimodality and concentration is often a consequence of the model inducing overly strong positive dependence among dyad values. For example, Snijders et al. (2006) use change statistics to show that under models with positive coefficients on triangle and k-star (k ≥ 2) counts — the classic “degenerate” ERGM terms — every tie added to the network increases the conditional odds of several other ties and does not decrease the odds of any, creating what Snijders et al. call an “avalanche” toward the complete graph, which emerges as by far the highest-probability realization. (More concretely, under a model with a triangle count with coefficient θΔ, adding a tie (i, j) increases the conditional odds of as many ties as i and j have neighbors by exp (θΔ).) Adjusting other parameters, such as density, down to obtain the expected level of sparsity close to that of the observed graph merely induces the bimodal distribution of Phenomenon 1.

An infinite sample space makes Phenomenon 1, as such, unlikely, because the “avalanche” does not have a maximal graph in which to concentrate. However, it does not preclude excessive dependence inducing a bimodal distribution at the MLE, even if neither mode is remotely degenerate in the probabilistic sense. The observed network being between these modes, this may lead to Phenomenon 3, and, due to the nature of the estimation algorithms, such a situation may, indeed, lead to failing estimation — Phenomenon 2.

In this work, we seek to avoid this problem by constructing statistics that prevent the “avalanche” by limiting dependence or employing counterweights to reduce it. (An example of the former approach is the modeling of transitivity in Section 5.2.6, and an example of the latter is the centering in the within-actor covariance statistic developed in Section 5.2.5.) Formal diagnostics developed to date, such as those of Schweinberger (2011) do not appear to be directly applicable to models with infinite sample spaces, so we rely on MCMC diagnostics (Goodreau et al., 2008) instead.

5. Statistics and interpretation for count data

In this section, we develop sufficient statistics for count data to represent network features that may be of interest and discuss their interpretation. In particular, unless otherwise noted, we focus on the Poisson-reference ERGM without complex constraints: 𝒴=0𝕐 and h(y) = ∏(i,j)∈𝕐(yi,j !)−1.

5.1. Interpretation of model parameters

The sufficient statistics of the binary ERGMs and valued ERGMs alike embody the structural properties of the network that are of interest. The tools available for interpreting them are similar as well.

5.1.1. Expectations of sufficient statistics

In a linear ERGM, if ΘN is an open set, then, for every k ∈ 1..p, and holding θk, k′ ≠ k, fixed, it is a general exponential family property that the expectation Eθ;h,η,g(gk(Y)) is strictly increasing in θk. (Barndorff-Nielsen, 1978, pp. 120–121) Thus, if the statistic gk is a measurement of some feature of interest of the network (e.g., magnitude of counts, interactions between or within a group, isolates, triadic structures), a greater value of θk results in a distribution of networks with more of the feature measured by gk present.

5.1.2. Discrete change statistic and conditional distribution

Binary ERGM statistics have a “local” interpretation in the form of change statistics summarized in Section 2.2.1, and we describe similar tools for “local” interpretation of ERGMs for counts here.

Define the set of networks

𝒴i,j(y){y𝒴:(i,j)𝕐\{(i,j)}yi,j=yi,j}.

That is, 𝒴i,j(y) is the set of networks such that all dyads but the focus dyad (i, j) are fixed to their values in y while (i, j) itself may vary over its possible values; and define y(i,j)=k(y𝒴i,j(y):yi,j=k) to be the network with non-focus dyads fixed and focus dyad set to k. Then, let the discrete change statistic

Δi,jk1k2g(y)g(y(i,j)=k2)g(y(i,j)=k1).

This statistic emerges when taking the ratio of probabilities of two networks that are identical except for a single dyad value:

Prθ;h,η,g(Yi,j=y(i,j)=k2|Y𝒴i,j(y))Prθ;h,η,g(Yi,j=y(i,j)=k1|Y𝒴i,j(y))=hi,j(k2)hi,j(k1)exp(θ·Δi,jk1k2(y)),

where hi,j : ℕ0 → ℝ is the component of h associated with dyad (i, j), such that h(y) ≡ ∏(i,j)∈𝕐 hi,j(yi,j), if it can be thus factored. For a Poisson-reference ERGM, hi,j(k) = (k!)−1. This may be used to assess the effect of a particular ERGM term on the decay rate of the ratios of probabilities of successive values of dyads (Shmueli et al., 2005) and on the shape of the dyadwise conditional distribution: the conditional distribution of a dyad (i, j) ∈ 𝕐, given all other dyads (i′, j′) ∈ 𝕐\{(i, j)},

Prθ;h,η,g(Yi,j=yi,j|Y𝒴i,j(y))=hi,j(yi,j)exp(θ·g(y))y𝒴i,j(y)h(yi,j)i,jexp(θ·g(y))=hi,j(yi,j)exp(θ·Δi,jk0yi,jg(y))k0hi,j(k)exp(θ·Δi,jk0kg(y)),

for an arbitrary baseline k0.

5.2. Model specification statistics

We now propose some specific model statistics to represent common network structural properties and distributions of counts.

5.2.1. Poisson modeling

We begin with statistics that produce Poisson-distributed dyads and model network phenomena that can be represented in a dyad-independent manner. As a binary ERGM reduces to a logistic regression model under dyadic independence, a Poisson-reference ERGM may reduce to a Poisson regression model.

In a Poisson-reference ERGM, the normalizing constant has a simple closed form if g(y′) is linear in yi,j and does not depend on any other dyads yi,j,(i,j)(i,j):

y𝒴yi,j0Δi,j0yi,jg(y)=yi,jxi,j. (4)

for xi,jΔi,jkk+1g(y) for any k ∈ ℕ0. Then,

Yi,j~ind.Poisson(μ=exp(θ·Δi,j01g(y))),

giving a Poisson log-linear model, and Δi,j01g effectively becomes the covariate vector for Yi,j. (If g(y′) is linear in yi,j but does depend on other dyads — xi,j in (4) depends on yi,j but not on yi,j itself — the dyad distribution is conditionally Poisson but not marginally so. An example of this arises in Section 5.2.4.)

Morris, Handcock and Hunter (2008) describe many dyad-independent sufficient statistics for binary ERGMs. All of them have the general form

gk(y)(i,j)𝕐yi,jxi,j,k,

where xi,j,kΔi,jgk and xi,j,k may be viewed as exogenous (to the model) covariates in a logistic regression for each tie. They could then be used to model a variety of patterns for degree heterogeneity and mixing among actors over (assumed) exogenous attributes. For example, for a uniform homophily model, xi,j,k may be an indicator of whether i and j belong to the same group. If yi,j are counts, these statistics induce a Poisson regression type model (for a Poisson-reference ERGM), where the effect of a unit increase in some θk on dyad (i, j) is to increase its expectation by a factor of exp (xi,j,k). Krivitsky et al. (2009) use this type of model to model Slovenian periodical “co-readerships” (Batagelj and Mrvar, 2006) — numbers of readers who report reading each pair of periodicals of interest — using as exogenous covariates the class of periodical (daily, weekly, regional, etc.) and the overall readership levels of each periodical.

Curved (i.e., η(θ) ≠ θ, p > q, and η not a linear mapping) ERGMs, in which the g satisfy (4) and dyadic independence, may induce nonlinear Poisson regression. An example of this is the likelihood component of some latent space network models, with latent space positions being treated as free parameters: the likelihoods of the hierarchical models of Hoff (2005) and Krivitsky et al. (2009) are special cases of such an ERGM, with η(θ) = (ηi,j(θ))(i,j)∈𝕐 and g(y) = (yi,j)(i,j)∈𝕐 (i.e., the sufficient statistic is the network), and ηi,j(θ) mapping latent space positions and other parameters contained in θ to the logarithms of dyad means (i.e., the dyadwise canonical parameters).

5.2.2. Zero modification

We now turn to model terms that may reshape the distribution of the counts away from Poisson. Social networks tend to be sparse, and larger networks of similar nature tend to be more sparse (Krivitsky, Handcock and Morris, 2011). If the interactions among the actors are counted, it is often the case that if two actors interact at all, they interact multiple times. This leads to dyadwise distributions that are zero-inflated relative to Poisson.

These features of sparsity can be modeled using statistics developed for binary ERGMs, applied to a network produced by thresholding the counts (at 1, for zero-modification). For example, a Poisson-reference ERGM with p = 2 and

g(y)=((i,j)𝕐yi,j,(i,j)𝕐1yi,j>0)

has dyadwise distribution

Prθ;h,η,g(Y=y)(i,j)𝕐exp(θ1yi,j+θ21yi,j>0)/yi,j!.

This is a parametrization of a zero-modified Poisson distribution (Lambert, 1992), though not a commonly used one, with the probability of 0 being (1 + exp (θ2) (exp (exp (θ1)) − 1))−1 and nonzero values being distributed (conditionally on not being 0) Poisson(μ = exp (θ1)), both reducing to Poisson’s when θ2 = 0. Notably, the probability of 0 decreases as θ1 increases, rather than being solely controlled by θ2.

5.2.3. Dispersion modeling

Consider the social network of face-to-face conversations among people living in a region. A typical individual will likely not interact at all with vast majority of others, have one-time or infrequent interaction with a large number of others (e.g., with clerks or tellers), and a lot of interaction with a relatively small number of others (e.g., family, coworkers). Some of this may be accounted for by information about social roles and preexisting relationships, but if such information is not available, this leads to a highly overdispersed distribution relative to Poisson, or even zero-inflated Poisson. Overdispersed counts are often modeled using the negative binomial distribution. (McCullagh and Nelder, 1989, p. 199) However, the negative binomial distribution with an unknown dispersion parameter is not an exponential family, making it difficult to fit using our inference techniques. We thus discuss two purely exponential-family approaches for dealing with non-Poisson-dispersed interaction counts in general and overdispersed counts in particular.

Conway–Maxwell–Poisson Distribution

Conway–Maxwell–Poisson (CMP) distribution (Shmueli et al., 2005) is an exponential family for counts, able to represent both under- and overdispersion: adding a sufficient statistic of the form

gCMP(y)=(i,j)𝕐log(yi,j!), (5)

to a Poisson-reference ERGM otherwise fulfilling conditions for Poisson regression described in Section 5.2.1 turns a Poisson regression model for dyads into a CMP regression model.

Its coefficient, θCMP, constrained by (3) to θCMP ≤ 1, controls the degree of dispersion: θCMP = 0 retains the Poisson distribution; θCMP < 0 induces underdispersion relative to Poisson, approaching the Bernoulli distribution as θCMP → −∞; and θCMP > 0 induces overdispersion, attaining the geometric distribution at θCMP = 1, its most overdispersed point.

Normally, the greatest hurdle associated with using CMP is that its normalizing constant does not, in general, have a known closed form. In our case, because intractable normalizing constants are already accommodated by the methods of Section 4, using CMP requires no additional effort.

At the same time, CMP is neither regular nor steep (per Appendix B), so the properties of its estimators are not guaranteed, particularly for highly overdispersed data. We have found experimentally that counts as dispersed as geometric distribution or more so often cause the fitting methods of Section 4 to fail.

Variance-like parameters

Some control over the variance can be attained by adding a statistic of the form g·a(y)=(i,j)𝕐yi,ja, a ≠ 1. Statistics with a > 1, such as g·2(y)=(i,j)𝕐yi,j2, suffer the same problem as a Strauss point process (Kelly and Ripley, 1976): for any θ, ε > 0, limy→∞ exp(θy1+ε)/y! = ∞, leading to (3) constraining θ ≤ 0, able to represent only underdispersion.

Thus, we propose to model dispersion by adding a statistic of the form

g·(y)=(i,j)𝕐yi,j1/2=(i,j)𝕐yi,j. (6)

To the extent that the counts are Poisson-like, the square root is a variance-stabilizing transformation (McCullagh and Nelder, 1989, p. 196). Then, a model with p = 2 and dyadwise sufficient statistic

g(y)=((i,j)𝕐yi,j,(i,j)𝕐yi,j) (7)

may be viewed as a modeling the first and second moments of yi,j. That the highest-order term is still on the order of yi,j guarantees that ΘN = ℝp — a practical advantage over CMP.

As with CMP, the normalizing constant is intractable. To explore the shape of this distribution, we fixed θ1 at each of a range of values and found θ2s such that the induced distribution had the expected value of 1. We then simulated from the fit. The estimated pmf for each configuration and the comparison with the geometric distribution with the same expectation is given in Figure 2. Smaller coefficients on (6) (θ1) correspond to greater dispersion, with coefficients on dyad sum (θ2) increasing to compensate, and vice versa, with θ1 = 0 corresponding to a Poisson distribution. As the dispersion increases, the mean is preserved in part by increasing Pr(Yi,j = 0) and, for sufficiently high values of yi,j, the geometric distribution still dominates. Thus, there is a trade-off between the convenience of a model without complex constraints on the parameter space and the ability to model greater dispersion. In practice, if the substantive reasons for overdispersion are due to unaccounted-for heterogeneity, the latter might not be a serious disadvantage, and excess zeros can be compensated for by a term from Section 5.2.2.

Fig 2.

Fig 2

Dyadwise distributions attainable by the model (7). Because Pr(Y = 0) varies greatly for different θ1 yet can be adjusted separately by an appropriate model term, we plot the probabilities conditional on Y > 0.

5.2.4. Mutuality

Many directed networks, such as friendship nominations, exhibit mutuality — that, other things being equal, if a tie (i, j) exists, a tie (j, i) is more likely to exist as well — and binary ERGMs can model this phenomenon using a sufficient statistic g(y) = ∑ (i,j)∈𝕐,i<j yi,jyj,i = ∑ (i,j)∈𝕐,i<j min(yi,j, yj,i), counting the number of reciprocated ties. (Holland and Leinhardt, 1981) Other sufficient statistics that can model it include g(y) = ∑(i,j)∈𝕐,i<j 1yi,jyj,i and g(y) = ∑(i,j)∈𝕐,i<j 1yi,j=yj,i, the counts of asymmetric and symmetric dyads, respectively. (Morris, Handcock and Hunter, 2008)

In the presence of an edge count term, these three are simply different parametrizations of the same distribution family:

yi,jyj,i=(yi,j+yj,i)1yi,jyj,i2=(yi,j+yj,i)1+1yi,j=yj,i2.

Nevertheless, these three different statistics suggest two major ways to generalize the terms to count data: by evaluating a product or a minimum of the values, or by evaluating their similarity or difference. We discuss them in turn.

Product

It is tempting to model mutuality for count data in the same manner as for binary data, with yi,j and yj,i being values rather than indicators. For example, a simple model with overall dyad mean and reciprocity terms, with p = 2 and

g(y)=((i,j)𝕐yi,j,(i,j)𝕐,i<jyi,jyj,i)

would have a conditional Poisson distribution:

Yi,j=yi,j|Y𝒴i,j(y)~Poisson(μ=exp(θ1+θ2yj,i)),

a desirable property. However, because for any c > 0, limy→∞ exp(cy2)/(y!)2 = ∞, for θ2 > 0, representing positive mutuality, (3) is not fulfilled. (Note that the expected value of Yi,j is exponential in the value of Yj,i and vice versa. Again, a Strauss point process exhibits a similar problem. (Kelly and Ripley, 1976))

Geometric mean

As with dispersion, the problem can be alleviated by using the geometric mean of yi,j and yj,i instead of their product. As in Section 5.2.3, this choice may be justified as an analog of covariance on variance-stabilized counts. This changes the shape of the distribution in ways that are difficult to interpret: if

g(y)=((i,j)𝕐yi,j,(i,j)𝕐,i<jyi,jyj,i),

then

Prθ;h,η,g(Yi,j=yi,j|Y𝒴i,j(y))exp(θ1yi,j+(θ2yj,i)yi,j)/yi,j!,

and, with nonzero yj,i, the probabilities of greater values of Yi,j are inflated by more. The analogy to covariance further suggests centering the statistic:

g(y)=(i,j)𝕐,i<j(yi,jy¯)(yj,iy¯),

for

y¯1|𝕐|(i,j)𝕐yi,j. (8)
Minimum

An alternative generalization is to take the minimum of the two values. For example, if

g(y)=((i,j)𝕐yi,j,(i,j)𝕐,i<jmin(yi,j,yj,i)),

then

Prθ;h,η,g(Yi,j=yi,j|Y𝒴i,j(y))exp(θ1yi,j+θ2min(yi,jyj,i,0))/yi,j!. (9)

Thus, a possible interpretation for this term is that the conditional probability for a particular value of Yi,j, yi,j is deflated by exp (θ2) for every unit by which yi,j is less than yj,i. In a sense, yj,i “pulls up” yi,j to its level and vice versa.

Negative difference

Generalizing the concept of similarity between yi,j and yj,i leads to a statistic of difference between their values. We negate it so that a positive coefficient value leads to greater mutuality. Then,

g(y)=((i,j)𝕐yi,j,(i,j)𝕐,i<j|yi,jyj,i|), (10)

and

Prθ;h,η,g(Yi,j=yi,j|Y𝒴i,j(y))exp(θ1yi,jθ2|yi,jyj,i|)/yi,j!,

so the conditional probability of a particular yi,j is deflated by exp (θ2) for every unit difference from yj,i, in either direction. Thus, yj,i “pulls in” yi,j and vice versa. Of course, other differences (e.g., squared difference) are also possible.

We use the discrete change statistic to visualize the differences among these variants in Figure 3, plotting the θΔi,j0yi,jg(y) summand of

logPrθ;h,η,g(Yi,j=yi,j|Y𝒴i,j(y))Prθ;h,η,g(Yi,j=0|Y𝒴i,j(y))=θ·Δi,j0yi,jg(y)

for each variant. Lastly, while the conditional distributions, and hence the parameter interpretations for the minimum and the negative difference statistic, are different, models induced by (9) and (10) are also reparametrizations of each other: min(yi,j,yj,i)=12((yi,j+yj,i)|yi,jyj,i|).

Fig 3.

Fig 3

Effect of proposed mutuality statistics (g) with parameter θ > 0 on the distribution of Yi,j, given that Yj,i = yj,i. Whereas the min(yi,j, yj,i) statistic deflates the probabilities of those values of yi,j that are less than yj,i, thus inflating all of those of yi,j above or equal to it, thus “pulling Yi,j up”, the −|yi,jyj,i| statistic deflates the probabilities in both directions away from yj,i, thus inflating those that are the closest, “pulling Yi,j in”. yi,jyj,i inflates greater values of yi,j in general, inflating by more for greater yj,i.

5.2.5. Actor heterogeneity

It is often the case that different actors in a network have different overall propensities to have ties: they are heterogeneous in their gregariousness, popularity, and/or (undirected) sociality. Some of this heterogeneity may be accounted for by exogenous covariates. For the unaccounted-for heterogeneity, two major approaches have been used: conditional, in which actor-specific parameters are added to the model to absorb its effects, and marginal, in which statistics are added that represent the effects of heterogeneity on the overall network features. Examples of the conditional approach include the very first exponential-family model for networks, the p1, which used a fixed effect for every actor (Holland and Leinhardt, 1981); and the p2 model and latent space models, which used random effects instead (van Duijn, Snijders and Zijlstra, 2004; Hoff, 2005; Krivitsky et al., 2009; Mariadassou, Robin and Vacher, 2010). The marginal approach includes the count of k-stars for k ≥ 2 (Frank and Strauss, 1986), which, for a fixed network density, become more prevalent as heterogeneity increases, at the cost of often inducing ERGM degeneracy; alternating k-stars and geometrically weighted degree statistics (Snijders et al., 2006; Hunter and Handcock, 2006), which attempt to remedy the degeneracy of k-stars; and statistics such as the square root degree activity/popularity, which sum each actor’s degree taken to 3/2 power, which also increases with greater heterogeneity, but not as rapidly as 2-stars do (Snijders, van de Bunt and Steglich, 2010), avoiding degeneracy. In the conditional approach, using fixed effects lacks parsimony and using random effects creates a problem with a doubly-intractable normalizing constant, beyond the scope of this paper, so we develop a marginal approach here.

Actor heterogeneity may be viewed marginally as positive within-actor correlation among the dyad values. Following the discussion in the previous sections, we propose a form of pooled within-actor covariance of variance-stabilized dyad values, scaled to the same magnitude as the dyad sum:

gout cov.(y)=iN1n2j,k𝕐ij<k(yi,jy¯)(yi,ky¯), (11)

for 𝕐i being the set of actors to who whom i may have ties (≡ {j′ : (i, j′) ∈ 𝒴}) and y¯ defined in (8). This statistic would increase with greater out-tie heterogeneity, an analogous statistic can be specified for in-tie heterogeneity, and dropping the directionality produces an undirected version of this statistic.

We have considered other variants, including the uncentered version, in which each summand in (11) is simply yi,jyi,k. We found that in undirected networks in particular, such a model term can induce a degeneracy-like bimodal distribution of networks. (This is likely because in undirected networks, the positive dependence is not contained within each actor, so subtracting y¯ serves as a counterweight to avert the “avalanche”.)

5.2.6. Triad-closure bias

We now turn to the question of how to represent triad-closure bias — friend-of-a-friend effects — in count data. As with mutuality, merely multiplying values of the dyads in a triad leads to a model that cannot have positive triad closure bias. In addition, ERGM sufficient statistics that take counts over triads often exhibit degeneracy. (Schweinberger, 2011) For these reasons, we describe a family of statistics that sum over dyads instead. Wyatt, Choudhury and Blimes (2010) use a generalization of the curved geometrically-weighted edgewise shared partners (GWESP) statistic (Hunter and Handcock, 2006), though it is not clear whether it is suitable for data with an infinite sample space. We thus describe a more conservative family of statistics.

One term used to model triad closure in binary dynamic networks by Snijders, van de Bunt and Steglich (2010) is the transitive ties effect, the most conservative special case of the GWESP (Hunter and Handcock, 2006) statistic. This statistic counts the number of ties (i, j) such that there exists at least one path of length 2 (two-path) between them — a third actor k such that yi,k = yk,j = 1. (Unlike the triangle count, each tie may contribute at most +1 to the statistic, no matter how many such ks exist.)

One generalization of this statistic to counts is

gtrans.ties(y)=(i,j)𝕐min(yi,j,maxkN(min(yi,k,yk,j))). (12)

Intuitively, define the strength of a two-path from i to j to be the minimum of the values along the path. The statistic is then the sum over the dyads (i, j) of the minimum of the value of (i, j) and the value of the strongest two-path between them. The interpretation is thus somewhat analogous to that of the minimum mutuality statistic, with yj,i replaced by maxkN(min(yi,k, yk,j)). The motivation for using minimum, as opposed to negative absolute difference, to combine the two-path value with the focus dyad value is that the intuitive notion of friend-of-a-friend effect that this statistic embodies suggests that while the presence of a mutual friend may increase the probability or expected value of a particular friendship (i.e., “pull it up”), it should not limit it (i.e., “pull it in”) as an absolute difference would. These interpretations are somewhat oversimplified: it is just as true that a positive coefficient on this statistic results in yi,j “pulling up” the potential two-paths between i and j.

In a directed network, (12) would model transitive (hierarchical) triads, while

gcycl.ties(y)=(i,j)𝕐min(yi,j,maxkN(min(yj,k,yk,i)))

would model cyclical (antihierarchical) triads.

The statistic (12) is a fairly conservative one, less likely to induce excessive dependence and bimodality, at the cost of sensitivity. More generally, one may specify a triadic statistic using three functions: first, υ2-path:02, how the “value” of a two-path ijk is computed from its constituent segments; second, υcombine : ℝn − 2 → ℝ, how the values of the possible two-paths from i to j are combined with each other to compute the strength of the pressure on i and j to close the triad or increase their interaction; and third, υaffect : ℕ0 × ℝ → ℝ how this pressure affects Yi,j. Given these,

gυ(y)=(i,j)𝕐υaffect(yi,j,υcombine(υ2-path(yi,k,yk,j)kN\{i,j})). (13)

Thus, for example, one could set υcombine to sum its arguments rather than take their maximum, or one can replace taking the minimum with taking a geometric mean. We illustrate the difference it makes in Section 6.

6. Application to interactions within a fraternity

In a series of studies in the 1970s, Bernard, Killworth and Sailer (1979–1980) assessed accuracy of retrospective sociometric surveys in a number of settings, including a college fraternity whose 58 occupants had all lived there for at least three months. To record the true amounts of interaction, for several days, unobtrusive observers were sent to periodically walk through the fraternity to note students engaged in conversation. Obtaining these network data from Batagelj and Mrvar (2006), we model these observed pairwise interaction counts.

The raw distribution of counts, given in Figure 4(a), appears to be strongly overdispersed relative to Poisson, and, indeed, relative to the geometric distribution: the mean of counts is 1.9, while their standard deviation (not variance) is 3.4. At least some of this is due to actor heterogeneity: the square root of the within-actor variance of the counts is 3.1. Excluding extreme observations (all values over 20) does not make a qualitative difference. (The statistics become 1.8, 2.8, and 2.5, respectively.) Nor does there appear to be a natural place to threshold the counts to produce a binary network. (See Figure 4(b).) We thus model the baseline shape of the distribution of counts using the following terms: baseline propensity to have ties: number of dyads with nonzero value; baseline intensity of interactions: sum of dyad values; and underdispersion: the statistic (6).

Fig 4.

Fig 4

Conversation count summaries for Bernard, Killworth and Sailer fraternity network

(We have also attempted to use CMP (via (5)) but found the process to be unstable due to the greater-than-geometric level of dispersion.)

Little was recorded about the social roles of the fraternity members, so we consider the effects of endogenous social forces: actor heterogeneity: the undirected version of (11); transitivity of intensities: the statistic (12).

Faust (2007), in particular, found that in many empirical networks, much of the apparent triadic effects are accounted for by variations in degree distribution and other lower-order effects. Thus, we consider four models: baseline shape only (B), baseline with heterogeneity (BH), baseline with transitivity (BT), and all terms (BHT), to explore this concept in a valued setting.

We report the model fits in Table 1. MCMC diagnostics, described by Goodreau et al. (2008), show adequate mixing and unimodal distributions of sufficient statistics, and networks simulated from these fits have, on average, statistics equal to the observed sufficient statistics. The baseline dyadwise distribution terms are difficult to interpret, but the highly negative coefficient on under-dispersion suggests a a strong degree of overdispersion, as expected. Some of this overdispersion appears to be absorbed by modeling actor heterogeneity, however. There are indications of a high degree of heterogeneity in individuals’ propensity to interact, over and above that expected for even the overdispersed baseline distribution. (Monte Carlo P-val. < 0.001 based on 10,000 draws.)

Table 1.

Results from fitting the models to Bernard, Killworth and Sailer fraternity network

Estimates (Std. Errors)
Term B BH BT BHT
Ties 5.60 (0.21) 4.96 (0.17) 6.24 (0.21) 4.98 (0.17)
Intensity 3.65 (0.05) 3.13 (0.06) 3.40 (0.07) 3.12 (0.06)
Underdispersion 9.71 (0.22) 8.23 (0.20) 10.52 (0.22) 8.26 (0.19)
Heterogeneity 1.48 (0.06) 1.46 (0.07)
Transitivity 0.46 (0.05) 0.03 (0.04)

Coefficients statistically significant at α = 0.05 are bolded.

Standard errors incorporate the uncertainty introduced by approximating the likelihood using MCMC (Hunter and Handcock, 2006).

Without accounting for actor heterogeneity (i.e., Model BT), there appears to be a strong transitivity effect — a friend of a friend is a friend — and the Monte Carlo test confirms this with a similar P-value. However, if actor heterogeneity is accounted for, the transitivity effects vanish (simulated one-sided P-val. = 0.43), suggesting that the underlying social process is better explained by a relatively small number of highly social individuals whose ties to each other and to (less social) third parties create excess transitive ties for the overall amount of interaction observed. At the same time, if, instead of using (12) as the test statistic, we use a less conservative statistic of the form (13) with υ2-path(x1,x2)=x1x2(geometric mean),υcombine(x1,,xn2)=k=1n2xk, and υaffect(x1,x2)=x1x2, the effect’s significance seems to increase (one-sided P-val. = 0.07). However, when we attempted to fit the model with this effect, the process exhibited the degeneracy-like bimodality. This suggests that there is a trade-off between stability and power to detect subtle effects.

7. Discussion

We have generalized the exponential-family random graph models to networks whose relationships are unbounded counts, explored the issues that arise when generalizing, and proposed ways to model several common network features for count data. We demonstrated our development by a study of the interaction of individual heterogeneity and friend-of-a-friend effects in a network with a hard-to-model dyadwise count distribution.

This paper focused on modeling counts. More generally, one can define a valued ERGM by replacing the set of possible dyad values ℕ0 by a more general set 𝕊 and replacing h(y) with a more general σ-finite measure space (𝒴, 𝖸, Ph) with reference measure Ph, then postulating a probability measure Pθ;Ph,η,g with Radon-Nikodym derivative of Pθ;Ph,η,g with respect to Ph,

dPθ;Ph,η,gdPh(y)=exp(η(θ)·g(y))κPh,η,g(θ),

(Barndorff-Nielsen, 1978, pp. 115–116; Brown, 1986, pp. 1–2) with the normalizing constant

κPh,η,g(θ)=𝒴exp(η(θ)·g(y))dPh(y).

For binary and count data, and discrete data in general, Ph could be specified as a function relative to the counting measure, while for continuous data, it could be defined with respect to the Lebesgue measure. Still, as with count data, the shape of this function would need to be specified.

Other scenarios might call for more complex specifications of the reference measure. Some network data, such as measurements of duration of conversation (Wyatt, Choudhury and Blimes, 2010) and international trade volumes (Westveld and Hoff, 2011) are continuous measurements except for having a positive probability of two actors not conversing at all or two countries having no measured trade. Westveld and Hoff use a normal distribution to model the log-transformed trade volume, imputing 0 = log(1) for 0 observed trade volumes (all nonzero trade volumes being greater than 1 unit), and they note this issue and address it by pointing out that in their (latent-variable) model, an impact of such an outlier would be contained. Valued ERGMs may provide a more principled approach by specifying a semicontinuous Ph, such as one that puts a mass of 1/2 on 0 and 1/2 on Lebesgue measure on (0, ∞).

We have also focused on data that do not impose any constraints on the sample space: 𝒴 ≡ 𝕊𝕐. But, some types of network data, such as those where each actor (ego) ranks the others (alters) (Newcomb, 1961, for example) can be viewed in this framework as having a constrained sample space: setting 𝕊 = {1..n − 1} and constraining 𝒴 to ensure that each ego assigns a unique rank to each alter gives a sample space of permutations that could, with a counting measure, serve as the reference measure for an ERGM on rank data. These, and other applications are a subject for ongoing and future work.

This paper focuses on models for cross-sectional networks, where a single snapshot of relationship states or relationships aggregated over a time period are observed. For longitudinal data, comprising multiple snapshots of networks over the same actors over time, binary ERGMs have been used as a basis for discrete-time models for network tie evolution by Robins and Pattison (2001), Wyatt, Choudhury and Bilmes (2009; 2010), Hanneke, Fu and Xing (2010), Krivitsky and Handcock (2010), and others. Valued ERGMs can be directly applied to the temporal ERGMs of Hanneke, Fu and Xing (2010) although their adaptation to the work of Krivitsky and Handcock (2010) may be less straight-forward, especially if the benefits to interpretability of the separable models are to be retained.

In practice, networks are not always observed completely. Handcock and Gile (2010) develop an approach to ERGM inference for partially observed or sampled binary networks. It would be natural to extend this approach to valued networks and valued ERGMs.

Some methods for assessing a network model’s fit, particularly MCMC diagnostics (Goodreau et al., 2008) can be used with little or no modification. Others, like the goodness-of-fit methods of Hunter, Goodreau and Handcock (2008) may require development of characteristics meaningful for valued networks. It may also be possible to extend the stability criteria of Schweinberger (2011) to models with infinite sample spaces.

Acknowledgments

The author thanks Mark S. Handcock for helpful discussions and comments on early drafts; Stephen E. Fienberg for his feedback on this manuscript; Michael Schweinberger, David R. Hunter, Tom A. B. Snijders, and Xiaoyue Niu for their feedback and advice; and the Editor, anonymous Associate Editor, and two anonymous referees for comments and suggestions that have greatly improved this paper. This research was supported by Portuguese Foundation for Science and Technology Ciência 2009 Program, ONR award N000140811015, and NIH award 1R01HD068395-01.

Appendix A: A sampling algorithm for a Poisson-reference ERGM

We use a Metropolis-Hastings sampling algorithm (Algorithm 1) to sample from a Poisson-reference ERGM, using a Poisson kernel with its mode at the present value of a dyad and, occasionally (with a specified probability π0), proposing a jump directly to 0. Because, as we discuss in Section 5.2.2, counts of interactions are often zero-inflated relative to Poisson, setting π0 > 0 can be used to speed-up mixing. For highly overdispersed distributions, a Poisson kernel may be trivially replaced by a geometric or even negative-binomial kernel.

This algorithm selects the dyad on which the jump is to be proposed at random. A possible improvement to this algorithm would be to adapt to it the tie-no-tie (TNT) proposal (Morris, Handcock and Hunter, 2008), which optimizes sampling in sparse (zero-inflated) networks by focusing on dyads which have nonzero values.

Algorithm 1.

Sampling from a Poisson-reference ERGM with no constraints, optimized for zero-inflated distributions

Let:
  RandomChoose(A) return a random element of a set A
  Uniform(a, b) return a random draw from the Uniform(a, b) distribution
  Poissony(λ) return a random draw from the Poisson(λ) distribution, conditional on not drawing y
  p(y*;y)=exp((y+12))(y+12)y*/y*!1exp((y+12))(y+12)y/y!, the pmf of a Poissony(y+12) draw
Input: y(0) ∈ 𝒴, T sufficiently large, 𝕐, g, η, π0 ∈ [0, 1)
Output: a draw from the specified Poisson-reference ERGM
  1: for t ← 1‥T do
  2:   (i, j) ← RandomChoose(𝕐) {Select a dyad at random.}
  3:   if yi,j ≠ 0 ∧ Uniform(0, 1) < π0 then
  4:     y* ← 0 {Propose a jump to 0 with probability π0.}
  5:   else
  6:     y*Poissonyi,j(t1)(yi,j(t1)) {Propose a jump to a new value.}
  7:   q{π0+(1π0)p(0;y*)p(y*;0)yi,j(t1)=0p(yi,j(t1);0)π0+(1π0)p(0;yi,j(t1))yi,j(t1)0y*=0(1π0)p(yi,j(t1);y*)(1π0)p(y*;yi,j(t1))otherwise
  8:   rq×yi,j(t1)!y*!×exp(η(θ)·Δi,jyi,j(t1)y*(y(t1)))
  9:   if Uniform(0, 1) < r then
10:     y(t)y(i,j)=y*(t1) {Accept the proposal.}
11:   else
12:     y(t)y(t−1) {Reject the proposal.}
13: return y(T)

Appendix B: Non-steepness of the Conway–Maxwell–Poisson family

Expressed in its exponential-family canonical form, a random variable X with the Conway–Maxwell–Poisson distribution has the pmf

Prθ;η,g(X=x)=exp(θ1x+θ2log(x!))κη,g(θ),x0

with the normalizing constant

κη,g(θ)=x=0exp(θ1x+θ2log(x!))κη,g(θ).

Theorem B.1. The Conway–Maxwell–Poisson family is not regular.

Proof. The natural parameter space of CMP is

ΘN={θ2:θ2<0(θ2=0θ1<0)}

(Shmueli et al., 2005). Due to the boundary at θ2 = 0, ΘN is not an open set, and hence the family is not regular (Brown, 1986, p. 2).

Theorem B.2. The Conway–Maxwell–Poisson family is not steep.

Proof. A necessary and sufficient condition for a non-regular exponential family to be steep is that

θΘN\ΘNoEθ;η,g(g(X))=,

where ΘNo is the open interior of ΘN, and their set difference is thus the non-open boundary of the natural parameter space that is contained within it. (Brown, 1986, Proposition 3.3, p. 72) For CMP, this boundary

ΘN\ΘNo={θ2:θ2=0θ1<0}.

There, X ~ Geometric(p = 1−exp (θ1)). Noting that X ≥ 0 a.s., log(X!) ≥ 0 a.s., and log(x!)(x+1)log(x+1e)+1,

Eθ;η,g(g(X))=EGeometric(p=1exp(θ1))([X,log(X!)])EGeometric(p=1exp(θ1))(X+log(X!))EGeometric(p=1exp(θ1))(X+(X+1)log(X+1e)+1)EGeometric(p=1exp(θ1))(X+(X+1)2+1)<,

since the first and second moments of the geometric distribution are finite. Therefore, CMP is not steep.

Because the non-steep boundary corresponds to the most dispersed distribution that CMP can represent, maximum likelihood estimator properties for data which are highly overdispersed are not guaranteed.

References

  1. Barndorff-Nielsen OE. Information and Exponential Families in Statistical Theory. New York: John Wiley & Sons, Inc.; 1978. [Google Scholar]
  2. Batagelj V, Mrvar A. Pajek datasets. 2006 Available at http://vlado.fmf.uni-lj.si/pub/networks/data/. [Google Scholar]
  3. Bernard HR, Killworth PD, Sailer L. Informant accuracy in social network data IV: A comparison of clique-level structure in behavioral and cognitive network data. Social Networks. 1979–1980;2:191–218. [Google Scholar]
  4. Besag J. Spatial Interaction and the Statistical Analysis of Lattice Systems (with Discussion) Journal of the Royal Statistical Society, Series B. 1974;36:192–236. [Google Scholar]
  5. Brown LD. Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Lecture Notes — Monograph Series. Vol. 9. Hayward, California: Institute of Mathematical Statistics; 1986. [Google Scholar]
  6. Diesner J, Carley KM. Exploration of communication networks from the Enron email corpus; Proceedings of Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining 2005; 2005. pp. 21–23. [Google Scholar]
  7. Faust K. Very Local Structure in Social Networks. Sociological Methodology. 2007;37:209–256. [Google Scholar]
  8. Frank O, Strauss D. Markov Graphs. Journal of the American Statistical Association. 1986;81:832–842. [Google Scholar]
  9. Freeman LC, Freeman SC. A semi-visible college: Structural effects of seven months of EIES participation by a social networks community. In: Henderson MM, McNaughton MJ, editors. Electronic Communication: Technology and Impacts; AAAS Symposium; Washington, D.C.: American Association for Advancement of Science; pp. 77–85. [Google Scholar]
  10. Geyer CJ. Likelihood Inference for Spatial Point Processes. In: Barndorff-Nielsen OE, Kendall WS, van Lieshout M-CNM, editors. Stochastic Geometry: Likelihood and Computation. Vol. 80. Boca Raton, Florida: Chapman & Hall/CRC Press; 1999. pp. 79–141. Monographs on Statistics and Applied Probability. [Google Scholar]
  11. Geyer CJ, Thompson EA. Constrained Monte Carlo Maximum Likelihood for Dependent Data (with discussion) Journal of the Royal Statistical Society Series B. 1992;54:657–699. [Google Scholar]
  12. Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM. A survey of statistical network models. Foundations and Trends in Machine Learning. 2009;2:129–233. [Google Scholar]
  13. Goodreau SM, Kitts J, Morris M. Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks. Demography. 2008;45:103–125. doi: 10.1353/dem.0.0045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M. A statnet Tutorial. Journal of Statistical Software. 2008;24:1–26. [PMC free article] [PubMed] [Google Scholar]
  15. Handcock MS. Seattle, WA: University of Washington; 2003. Assessing Degeneracy in Statistical Models of Social Networks Working Paper report No. 39, Center for Statistics and the Social Sciences. [Google Scholar]
  16. Handcock MS. Statistical Exponential-Family Models for Signed Networks. 2006 Unpublished manuscript. [Google Scholar]
  17. Handcock MS, Gile KJ. Modeling Social Networks from Sampled Data. Annals of Applied Statistics. 2010;4:5–25. doi: 10.1214/08-AOAS221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks Version 3.0-1. The Statnet Project. 2012 doi: 10.18637/jss.v024.i03. http://www.statnet.org. [DOI] [PMC free article] [PubMed]
  19. Hanneke S, Fu W, Xing EP. Discrete Temporal Models of Social Networks. Electronic Journal of Statistics. 2010;4:585–605. [Google Scholar]
  20. Harris KM, Florey F, Tabor J, Bearman PS, Jones J, Udry JR. University of North Carolina; 2003. The National Longitudinal Study of Adolescent Health: Research Design Technical Report. [Google Scholar]
  21. Hoff PD. Bilinear Mixed Effects Models for Dyadic Data. Journal of the American Statistical Association. 2005;100:286–295. [Google Scholar]
  22. Holland PW, Leinhardt S. An Exponential Family of Probability Distributions for Directed Graphs. Journal of the American Statistical Association. 1981;76:33–65. [Google Scholar]
  23. Hunter DR, Goodreau SM, Handcock MS. Goodness of Fit for Social Network Models. Journal of the American Statistical Association. 2008;103:248–258. [Google Scholar]
  24. Hunter DR, Handcock MS. Inference in Curved Exponential Family Models for Networks. Journal of Computational and Graphical Statistics. 2006;15:565–583. [Google Scholar]
  25. Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software. 2008;24:1–29. doi: 10.18637/jss.v024.i03. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Kelly FP, Ripley BD. A Note on Strauss’s Model for Clustering. Biometrika. 1976;63:357–360. [Google Scholar]
  27. Krivitsky PN, Handcock MS. A Separable Model for Dynamic Networks. 2010 doi: 10.1111/rssb.12014. Under review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Krivitsky PN, Handcock MS, Morris M. Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models. Statistical Methodology. 2011;8:319–339. doi: 10.1016/j.stamet.2011.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Krivitsky PN, Handcock MS, Raftery AE, Hoff PD. Representing Degree Distributions, Clustering, and Homophily in Social Networks with Latent Cluster Random Effects Models. Social Networks. 2009;31:204–213. doi: 10.1016/j.socnet.2009.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lambert D. Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics. 1992;34:1–14. [Google Scholar]
  31. Lazega E, Pattison PE. Multiplexity, generalized exchange and cooperation in organizations: a case study. Social Networks. 1999;21:67–90. [Google Scholar]
  32. Mariadassou M, Robin S, Vacher C. Uncovering Latent Structure in Valued Graphs: A Variational Approach. Annals of Applied Statistics. 2010;4:715–742. [Google Scholar]
  33. McCullagh P, Nelder JA. Generalized Linear Models. Second ed. Vol. 37. Chapman & Hall/CRC; 1989. Monographs on Statistics and Applied Probability. [Google Scholar]
  34. Morris M, Handcock MS, Hunter DR. Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software. 2008;24:1–24. doi: 10.18637/jss.v024.i04. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Morris M, Kretzschmar M. Concurrent Partnerships and the Spread of HIV. AIDS. 1997;11:641–648. doi: 10.1097/00002030-199705000-00012. [DOI] [PubMed] [Google Scholar]
  36. Newcomb TM. The Acquaintance Process. Holt, Rinehart, Winston, New York: 1961. [Google Scholar]
  37. Pattison P, Wasserman S. Logit Models and Logistic Regressions for Social Networks: II. Multivariate Relations. British Journal of Mathematical and Statistical Psychology. 1999;52:169–193. doi: 10.1348/000711099159053. [DOI] [PubMed] [Google Scholar]
  38. Read KE. Cultures of the central highlands, New Guinea. Southwestern Journal of Anthropology. 1954;10:1–43. [Google Scholar]
  39. Rinaldo A, Fienberg SE, Zhou Y. On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models. Electronic Journal of Statistics. 2009;3:446–484. [Google Scholar]
  40. Robbins H, Monro S. A Stochastic Approximation Method. The Annals of Mathematical Statistics. 1951;22:400–407. [Google Scholar]
  41. Robins G, Pattison P, Wasserman SS. Logit Models and Logistic Regressions for Social Networks: III. Valued Relations. Psychometrika. 1999;64:371–394. doi: 10.1348/000711099159053. [DOI] [PubMed] [Google Scholar]
  42. Robins G, Pattison P. Random graph models for temporal processes in social networks. Journal of Mathematical Sociology. 2001;25:5–41. [Google Scholar]
  43. Sampson SF. Ph.D. thesis. Ithaca, New York: Department of Sociology, Cornell University; 1968. A Novitiate in a Period of Change: An Experimental and Case Study of Social Relationships. (University Micofilm, No 69–5775) [Google Scholar]
  44. Schweinberger M. Instability, Sensitivity, and Degeneracy of Discrete Exponential Families. Journal of the American Statistical Association. 2011;0:1–10. doi: 10.1198/jasa.2011.tm10747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Shmueli G, Minka TP, Kadane JB, Borle S, Boatwright P. A Useful Distribution for Fitting Discrete Data: Revival of the Conway–Maxwell–Poisson Distribution. Journal of the Royal Statistical Society: Series C. 2005;54:127–142. [Google Scholar]
  46. Snijders TAB. Markov chain Monte Carlo Estimation of Exponential Random Graph Models. Journal of Social Structure. 2002;3 [Google Scholar]
  47. Snijders TAB, van de Bunt GG, Steglich CEG. Introduction to Stochastic Actor-Based Models for Network Dynamics. Social Networks. 2010;32:44–60. [Google Scholar]
  48. Snijders TAB, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociological Methodology. 2006;36:99–153. [Google Scholar]
  49. Strauss D, Ikeda M. Pseudolikelihood Estimation for Social Networks. Journal of the American Statistical Association. 1990;85:204–212. [Google Scholar]
  50. Thomas AC, Blitzstein JK. Valued Ties Tell Fewer Lies: Why Not To Dichotomize Network Edges With Thresholds. 2011 [Google Scholar]
  51. van Duijn MAJ, Snijders TAB, Zijlstra BJH. p2: a random effects model with covariates for directed graphs. Statistica Neerlandica. 2004;58:234–254. [Google Scholar]
  52. Ward MD, Hoff PD. Persistent Patterns of International Commerce. Journal of Peace Research. 2007;44:157. [Google Scholar]
  53. Westveld AH, Hoff PD. A mixed effects model for longitudinal relational and network data, with applications to international trade and conflict. Annals of Applied Statistics. 2011;5:843–872. [Google Scholar]
  54. Wyatt D, Choudhury T, Bilmes J. NIPS-09 workshop on Analyzing Networks and Learning with Graphs. Neural Information Processing Systems (NIPS); 2009. Dynamic Multi-Valued Network Models for Predicting Face-to-Face Conversations. [Google Scholar]
  55. Wyatt D, Choudhury T, Blimes J. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Association for the Advancement of Artificial Intelligence; 2010. Discovering Long Range Properties of Social Networks with Multi-Valued Time-Inhomogeneous Models. [Google Scholar]
  56. Zachary WW. An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research. 1977;33:452–473. [Google Scholar]

RESOURCES