Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jan 16.
Published in final edited form as: Stat Interface. 2019;12(1):181–191. doi: 10.4310/SII.2019.v12.n1.a15

Bayesian modeling and uncertainty quantification for descriptive social networks*

Thomas Nemmers 1, Anjana Narayan 2, Sudipto Banerjee 3,
PMCID: PMC6335039  NIHMSID: NIHMS996777  PMID: 30662582

Abstract

This article presents a simple and easily implementable Bayesian approach to model and quantify uncertainty in small descriptive social networks. While statistical methods for analyzing networks have seen burgeoning activity over the last decade or so, ranging from social sciences to genetics, such methods usually involve sophisticated stochastic models whose estimation requires substantial structure and information in the networks. At the other end of the analytic spectrum, there are purely descriptive methods based upon quantities and axioms in computational graph theory. In social networks, popular descriptive measures include, but are not limited to, the so called Krackhardt’s axioms. Another approach, recently gaining attention, is the use of PageRank algorithms. While these descriptive approaches provide insight into networks with limited information, including small networks, there is, as yet, little research detailing a statistical approach for small networks. This article aims to contribute at the interface of Bayesian statistical inference and social network analysis by offering practicing social scientists a relatively straightforward Bayesian approach to account for uncertainty while conducting descriptive social network analysis. The emphasis is on computational feasibility and easy implementation using existing R packages, such as sna and rjags, that are available from the Comprehensive R Archive Network (https://cran.r-project.org/). We analyze a network comprising 18 websites from the US and UK to discern transnational identities, previously analyzed using descriptive graph theory with no uncertainty quantification, using fully Bayesian model-based inference.

Keywords: Bayesian modeling, Directed graphs, Krackhardt’s axioms, PageRank algorithms, Social network analysis, Uncertainty quantification

1. INTRODUCTION

Social network analysis (SNA), sometimes referred to as “structural analysis”, constitutes a key methodology in modern sociology. SNA models social phenomenon using networks and studies them using graph theory. The vertices of the graph are referred to as nodes or actors and the edges between these nodes represent links or connections defined using an underlying relation. SNA has generated much interest in diverse disciplines well beyond sociology; they include but need not be limited to biology, economics, geography, organizational studies, political science, and computer science due to their ability to model potentially complex interactions in the underlying network. Examples of social networks include, but are not limited to, media networks, friendship and acquaintance networks, disease transmission, and biological gene networks. Network analysis techniques can broadly be described as either descriptive or statistical. Descriptive methods exploit measures based upon the structure of the network to quantify properties such as how connected or centralized the network is and quantifies the importance of each node in the network. Such methods abound in the literature and a comprehensive review is beyond the scope of this article [see, e.g., 28, 3, 13, 20, 25, 26, for excellent reviews]. Stochastic methods formulate statistical models for the network and include random graph models [24], latent space approaches [15], and several other probability models using matrix-variate distributions [8].

We outline some model-based strategies for quantifying uncertainty in descriptive SNA. Our particular focus is on small networks, where information is usually too limited to fit richer stochastic models such as the latent class or exponential random graph models. We pursue Bayesian inference for small networks. This is appealing because it offers exact sampling-based inference as opposed to asymptotics that may be difficult to envisage for small networks. We also offer direct and easy interpretation of uncertainty in estimation. We build a Bayesian hierarchical model for the variables associated with the network and compute the posterior distribution of some descriptive measures from the underlying graph using posterior predictive data replicates [see, e.g., 12]. There are a variety of such descriptive measures [see, e.g., 27, 28, for a detailed outline of such measures]; we opt for some common measures described in [28] and [18], often referred to as Krackhardt’s axiomatic measures. Our choice is governed primarily by the availability of these measures in statistical software packages. We will also outline simple Bayesian inference on “PageRanks”.

Our intended contribution lies at the interface between descriptive SNA and parametric Bayesian inference. The proposed approach will likely assist social scientists in gleaning information from small networks with limited information regarding the underlying processes generating structures within the network. We also emphasize easy implementation. The Bayesian models deliver clear interpretation and are implemented entirely within the R statistical computing environment using packages such as sna (for descriptive social networks) and rjags (for Bayesian computations).

Our application concerns a small network studied by [21] to better understand recent intercultural and social phenomenon involving migrant students and how they develop their ethnic identities. Narayan et al. [21] have conducted some descriptive structural analysis on a network constructed from websites of Hindu student groups within the United States and the United Kingdom to examine whether websites can help foster a transnational identity.

The article proceeds as follows. Section 2 gives a basic description of the network we will be revisiting. Section 3 begins with a brief review of familiar concepts in graph theory used in SNA. This is followed by a discussion of the quantities and measures in descriptive SNA that will be analyzed subsequently using Bayesian modeling, including uncertainty quantification with posterior predictive data replicates. Section 4 presents a Bayesian analysis of the network. We conclude the paper with a brief discussion in Section 5.

2. DATA

For the analysis, the dataset from Narayan, et al. [21] was used. The network analyzed there was a collection of 18 web-sites for Hindu student groups. Again, the focus of the paper is on small networks such as these. Websites 3 and 14 were the national student group websites for the United Kingdom and the United States, respectively. Based upon links from those two websites, other websites were found that were local, functioning, and currently maintained. Seven additional websites were found from the UK, and nine other from the US. Pictorially, the network can be seen in Figure 1 and the corresponding adjacency matrix is shown in Table 1. A fully deterministic network analysis for this dataset can be found in Narayan, et al. [21].

Figure 1.

Figure 1.

The network of websites represented as a directed graph corresponding to the adjacency table. The labels correspond to sites in the table.

Table 1.

The adjacency matrix

Site Names 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 Cambridge 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Glasgow 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
3 UK-National 1 1 0 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0
4 Nottingham 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 STG 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
6 UCL 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
7 UMU 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
8 Warwick 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
9 Austin 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
10 Berkeley 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
11 CMU 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 Cornell 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 1
13 George Mason 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1
14 US-National 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1
15 MIT 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1
16 NCSU 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 1
17 PSU 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1
18 Stanford 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3. METHODS

3.1. Descriptive social network analysis

Descriptive SNA refers to descriptive measures derived from the structure of the network. These measures are easy to compute and interpret, but do not account for the uncertainties inherent in modeling the relationships in the network. These measures can be regarded as local or global. Local measures determine the relative importance of a vertex (for example, how important or “central” a website is within a network), while global measures apply to the network as a whole, such as how connected (or not) the network is. After some basic definitions in Section 3.1.1, we will discuss local measures using vertex analysis in Section 3.1.2 and PageR-ank algorithms in Section 3.1.3. We describe global measures using Krackhardt’s axiomatic analysis in Section 3.1.4.

3.1.1. Definitions and measures

We will model the social network as a directed, graph, or digraph, G = {V, E}, where V is the set of vertices representing websites and E = { (i, j): iV, jV } consists of pairs of websites that are connected by a link from web-site i to website j. The set of edges E describes a relation on the set of vertices V. The adjacency matrix or relational table has its (i, j)-th entry equal to 1 if (i,j)E and 0 if i and j are not related. Each edge can be thought of as having a direction. For any given vertex, its indegree is the number of edges coming into it, and its outdegree is the number of edges leaving from it. A path in G is a sequence of vertices such that from each of its vertices there is an edge to the subsequent vertex. Also of interest is the distance between two vertices in a graph: this is the number of directed edges in the shortest path connecting them. Note that the shortest path is not always unique, but the distance is. The shortest paths are called geodesics and the distance is the geodesic distance. There are a number of other measurements defined in terms of distance: the eccentricity of a vertex v is the greatest distance between v and any other vertex. Two vertices u and v are called connected, if G contains a directed path from u to v. Otherwise, the two vertices are called disconnected. A graph is called connected, if every pair of vertices in the graph is connected. A maximal connected, subgraph S = (Vs, Es) of G is such that EsE and VsV, if i, jV, iVs, but jVs neither (i,j)E nor (j, i) ∈ E. A connected component, of a graph, G, is the same as a maximal connected subgraph of G. Each vertex belongs to exactly one connected component, as does each edge. The graph is weakly connected, if replacing all of its directed edges with undirected edges produces a connected (undirected) graph. It is strongly connected., or strong, if it contains a directed, path from u to υ for every pair of vertices u,vV. A weak component, W, is a subgraph of G such that W only becomes a maximal connected subgraph when its directed edges are replaced with undirected edges. The strong components are the maximal connected subgraphs (no need to replace the directed edges with undirected edges). A graph structure can be extended by assigning a weight to each edge of the graph. A digraph with weighted edges is called a network. Networks have many practical uses.

3.1.2. Vertex analysis using digraphs

Within graph theory and network analysis, there are various measures for the centrality of a vertex within a graph that determine the relative importance of that vertex (for example, how important a website is within a network). These measures attempt to quantify the prominence of an individual node embedded in a network. The measures investigated here include, among others, (i) degree, (ii) betweenness, (iii) closeness, (iv) eigenvector centrality, and (v) structure statistics.

The calculation of overall network centralization starts by determining the centrality of each individual in the network. There are many different definitions of centrality in the literature [28], and the choice between measures is based on the nature of the relationships. The most basic definition is based on degree: nodes that “receive” or “send” more connections are more central than those that do not. Indegree and outdegree can be used; in addition, the total degree can be used, which is the sum of the indegree and outdegree. This total degree is also called the Freeman measure of connectedness or centrality [10].

The betweenness, CB(υ), for vertex υ is

CB(v)=ivji,jvgivjgij

where gij is the number of geodesics from i to j and givj is the number of those geodesics that pass through v. Conceptually, vertices with high betweenness lie on a large number of non-redundant shortest paths between other vertices [9].

The next measure of centrality is closeness. The closeness of a vertex v is defined as

CC(v)=|V|1i:ivd(v,i),

where |V| is the cardinality of V and d,(i, j) is the geodesic distance between i and j (where defined). Intuitively, closeness provides an index of the extent to which a given vertex has short paths to all other vertices in the graph.

Two other measures of centrality are Harary centrality and eigenvalue centrality of a vertex v. The Harary centrality of v is

CH(v)=1maxud(v,u)

where d,(v,u) is the geodesic distance from v to u. Eigenvector centrality scores correspond to the values of the first eigenvector of the graph’s adjacency matrix; the first eigenvector being the eigenvector corresponding to the largest eigenvalue. These scores may, in turn, be interpreted as arising from a reciprocal process in which the centrality of each actor is proportional to the sum of the centralities of those actors to whom it is connected [14].

Also useful is the structure statistics, which is computed as follows: let d(i,j) be the geodesic distance from vertex i to vertex j in G. The structure statistics of G are then given by the sequence {s0, …, sn−1}, where n = |V| and

si=1n2jVkVI(d(j,k)i)

where I(d(j, k)i) = 1 if d{j, k) ≤ i and 0 otherwise. Intuitively, si is the fraction of G which lies within distance i of a randomly chosen vertex. Structure statistics have been of particular importance to biased net theorists, because of the link with Rapoport’s original tracing model [28]. They may also be used along with component distributions or connectedness scores as descriptive indices of connectivity at the graph-level.

The centrality measures can also be aggregated to obtain a global centrality index for the entire network. For example, centralization refers to the extent to which the network is concentrated on one vertex or a group of vertices. Numerically, a centralized network is one which has a few or one vertex with considerably higher centrality scores than others in the network. The centralization or global centrality index of a graph G for centrality measure C(v) is defined as

C*(G)=iV|C(i)maxvVC(v)|,

or, the absolute deviation from the maximum of the centrality measure C(v) on G [10].

Tables showing all of the aforementioned measures have been presented in detail in [21], hence are not reproduced here.

3.1.3. PageRank algorithm

The PageRank algorithm [6] can be added to the other, more usual vertex analysis measures. It can be regarded as a centrality measure of a certain vertex within a network. This algorithm starts by creating a transition probability matrix, P, whose rows and columns correspond to each vertex in the network, composed of n vertices in total. For the i-th row, ri refers to the count of all vertices to which vertex i has an edge, or hyperlink. So, ri is equal to the outdegree of i. Now,

(P)ij={1/ri if i has a link to j0 otherwise ,

i, j. If vertex i has an outdegree of 0, then each entry in the ith row of P is simply 1 /n. It is easy to see that, for the ith row in P, all of the entries sum to 1. Or, denoting 1 as the n × 1 dimensional vector with all entries equal to 1 (n being the number of rows or columns of P),

P1=1.

Thus, 1 is the eigenvector of P corresponding to the eigenvalue of 1. Matrices having this property are said to be row-stochastic and this transition probability matrix corresponds to a discrete Markov chain with n states. It is well-known that if P is a regular transition matrix, i.e., some power of P has all its entries positive, then

limmPm=W,

where W is a matrix all of whose row vectors are the same [see, e.g, 19, 17, 1]. Denoting any one of these rows as α, α is the stationary distribution of P. Should the matrix P converge, in this sense, to a stationary distribution α, then the PageRank algorithm returns the rank of the i-th vertex as the i-th entry in α.

As far as finding α for a row-stochastic transition probability matrix P. the following results provide a solution. First, if W and, therefore, α exist, then the vector π such that

Pπ=π

is equal to kα, where k is a scalar constant. Since α is a row of a row-stochastic matrix,

α1=1.

Thus, we can find k from the relation

(1k)π1=1

and (1/k)π is the stationary distribution. Notice that π is simply the eigenvector of P associated with the eigenvalue of 1.

The second result is that all irreducible transition probability matrices converge to a unique stationary distribution. A transition probability matrix is irreducible if there exists no permutation matrix M such that

MPM=[XYOZ],

where X and Z are both square matrices and O has all entries equal to zero (X, Y, O, and Z are all dimensionally compatible). Thus, if the transition matrix can be shown to be irreducible, the eigenvector of its transpose that is associated with the eigenvalue of 1 is its stationary distribution, after appropriate scaling. The entries of the stationary distribution are the ranks for each vertex. The vertex with the highest rank is deemed the most important vertex in the network as it has the highest probability of getting a hit were there a web search over this network.

While the original transition probability matrix P may not always be irreducible, the algorithm actually proceeds to work with a similar transition probability matrix, P˜:

P˜=dP+(1d)(1n)J,

where d, ∈ [0,1] and J is an n × n matrix with all entries equal to one. Conventionally, d is set to 0.85 [6]. Notice that when d > 0, all of the entries in P˜ are strictly positive, meaning P˜ is irreducible and will always converge to a stationary distribution. The stationary distribution of P˜ returns the PageRanks for each vertex [19, 17, 1].

The above deterministic algorithm was implemented for the adjacency matrix corresponding to Table 1 and the PageRanks for the network are preseted in Table 2. The vertices are arranged from the highest to the lowest rank (up to 2 significant digits).

Table 2.

PageRanks for the network

Vertex PageRank
14 0.17
3 0.11
18 0.07
9 0.06
11 0.06
10 0.05
12 0.05
13 0.05
14 0.05
15 0.05
16 0.05
17 0.05
1 0.03
2 0.03
4 0.03
5 0.03
6 0.03
7 0.03
8 0.03

3.1.4. Krackhardt’s axiomatic analysis

The preceding section devoted itself to analyzing the centrality properties in the structure of a network by locating and ranking nodes in the network according to their importance. We now turn to global measures describing the level of connectedness and the hierarchy of a network’s structure. These are evaluated using Krackhardt’s axiomatic approach [18]. Rather than assign an importance measure to each vertex, we now provide a score to the entire network. A basic method for assessing connectedness is constructing the reachability table or matrix for a network. Two vertices reach each other if there exists a path connecting them. A reachability graph is constructed from a given graph or network by joining or linking vertices that can be reached from one another in the original graph. For the reachability matrix, the (i, j)-th entry is 1 if vertex j can be reached from vertex i in the original network.

In addition to reachability, Krackhardt’s measures can be used. There are four such measures developed for studying the underlying structure: connectedness, hierarchy, efficiency, and least upper bound (LUB), or “LUBness”. These four measures quantify the four conditions Krackhardt considered necessary for a graph to be considered a hierarchy or an out-tree. First, connectedness is defined as:

CK=1Dn(n1)/2

where D is the number of pairs of points that cannot reach one another in a “weak” sense, meaning that all directed paths are converted to undirected paths before assessing reaching. In other words, Krackhardt’s connectedness for a digraph G is equal to the fraction of all pairs of nodes, (i, j), such that there exists an undirected path from i to j in G. The connectedness score ranges from 0, for a null graph to 1, for a weakly or strongly connected graph.

The Krackhardt efficiency of a graph G is computed as follows: suppose all of G’s weak components are G1,G2, … , Gm. Denote the cardinalities of these graphs’ vertex sets by |V(G)| = N and |V(Gi)| = Ni, for i = 1, …, m. Then the Krackhardt efficiency of G is given by

EK=1|E|i=1m(Ni1)[N(N1)/2]i=1m(Ni1),

where |E| is the number of edges in G. A high value, close to 1, implies that the network has the minimal number of edges in order for its weak components to be connected. Efficiency can also be interpreted as 1 minus the proportion of possible “extra” edges above those needed to connect the weak components. A graph with an efficiency of 1 has precisely as many edges as are needed to connect its components; as additional edges are added after that, the efficiency gradually falls towards 0.

Hierarchy measures quantify the extent of asymmetry in a structure. To understand symmetric directed paths, first consider that, for each directed path, there is a “sender”, the first vertex in the sequence, and a “receiver”, the final vertex in the sequence. A path is said to be symmetric if there exists a directed path from its receiver back to its sender; this new path is also trivially symmetric. All other such paths are said to be asymmetric. The Krackhardt hierarchy is defined as the fraction of paths which are asymmetric. The closer this value is to 1, the more hierarchical, in a conventional sense, are the relationships in the communications network. That is, a link from website A does not also have some existing path of links from website B back to A.

The Least Upper Boundedness, or “LUBness”, is defined as follows. A node k is said to be an “upper bound” for two nodes i and j if the directed paths ki and kj belong to G. If such a node does not exist, then i and j do not have an upper bound. An upper bound l is known as a least upper bound, or LUB, for i and j if, for all upper bounds, k, of i and j, l belongs to at least one of the ki paths and at least one of the kj paths. Where all vertex pairs possess a least upper bound, Krackhardt’s LUBness is equal to 1; in general, it approaches 0 as this condition is broached. Krackhardt offers that the LUB is the common “boss” for two nodes, either directly or up though the network.

Krackhardt’s axiomatic analysis has also been presented in detail in [21], hence are not reproduced here. Instead, we will now turn to the statistical models for uncertainty quantification.

3.2. Bayesian modeling

3.2.1. Bayesian hierarchical modeling

To statistically model the various network analysis measures, the adjacency matrix must first be modeled. Let Yij be the (i, j)-th entry of an n × n adjacency matrix Y associated with the network. We will assume a binary adjacency matrix, i.e.,

Yij={1 if there is an edge going from node i to node j0 if not .

Note that the edges are directed, so the matrix Y need not be symmetric; a directed edge from node i to node j implies Yij = 1, but does not imply Yij = 1. Each entry in the adjacency matrix is modeled using logistic regression,

Yij|πij~ Bern (πij)
logit (πij)=xijβ, (1)

for i, j = 1,2, …, n, where xij is a p × 1 vector of covariates and β is the vector of slopes. The posterior distribution for β is

p(β|Y)p(β)×i,j=1n Bern(Yij| logit 1(xijβ)),

where p(β) is the prior on β.

Bayesian logistic regression and the effect of priors (weakly informative, non-informative and informative) have been explored in depth by Gelman et al. in [11] and also by other authors in the context of noninformative (including flat or improper) priors for generalized linear models [see, e.g., 7, 2, 16, 30, among others]. A full exploration for different choices of priors is not attempted here. Instead, we will focus on some typical choices for the p × 1 vector β.

One specification assumes a weakly informative Gaussian prior, β ~ N(0, τ2Ip), where τ2 is a positive scalar. A very large value of τ2 yields a vague proper prior for β and lim τ2 → ∞ results in the improper uniform prior. Necessary and sufficient conditions for the propriety of the posteriors under improper priors have been established rigorously in [7]. Here, we do not use improper priors, so our posterior distributions are proper.

Another proper specification assumes a multivariate normal-Wiflhart prior

β|Σ~Np(0,Σ),Σ1~ Wish (Ip,p+1),

where Σ is an unknown p × p positive definite matrix, and Ip is the p × p identity matrix. Markov chain Monte Carlo (MCMC) methods are used to sample from p(β|Y) or p(β, Σ | Y), as the case may be, for each of the above models [see, e.g., 12]. These can be implemented in a number of available R packages; for the specific analysis in this paper we used rjags.

3.2.2. Predictive distribution for network analysis measures

One possible approach to quantifying uncertainty in descriptive SNA is to evaluate the proposed descriptive me» sures using replicated datasets generated from the posterior predictive distribution [12]. A replicated, data matrix is defined as the random n × n adjacency matrix Yrep with entries Yij rep  such that p (Yrep | Y, θ) = p(Yrep|θ), where θ represents the set of model parameters to be estimated. Thus, Yrep is assumed to be generated from the same model as the observed data and is assumed to be conditionally independent of Y given πij. One can also regard YreP as a future or alternate adjacency matrix that could have been observed from the assumed underlying probability model for the realized Y

Since uncertainty quantification needs to account for the uncertainty in estimating θ, we compute the posterior predictive distribution of the replicated adjacency matrix,

p(Yrep|Y)=p(Yrep|θ)p(θ|Y)dθ, (2)

where Yij rep |π~ind Bern (πij) and πij is determined by β. We sample from (2) using samples from p(β | Y). From each sampled β(s) ~ p(β | Y) for s = 1, 2, …, S, we compute the corresponding π(s) and draw Yijrep,(s)~ Bern (πij(s)) for i, j = 1, 2, …, n.. The collection {Yrep,s} for s = 1,2, … ,S are samples of the replicated data from the distribution in (2). Each of the descriptive measures in Section 3 are fully determined from the graph G, hence the adjacency matrix Y. Therefore, we can express them as a function T(Y). Computing T(Yrep,s) for each s yields samples from p(T(Yrep) | Y).

3.2.3. Bayesian PageRank model

Posterior predictive distributions for replicated adjacency matrices can also offer uncertainty quantification in PageR-anks by applying the PageRank algorithm to the replicated datasets. However, the πij’s from (1) are not modeled jointly and, hence, do not necessarily correspond to a transition matrix; i.e., they need not be row-stochastic. An alternative is to develop a Bayesian model for the row-stochastic matrix involved in the PageRank calculation. The following model is an option.

Yij|πij~indBern (πij),i,j=1,2,,n,πi|α~indDirichlet (α),i=1,2,,n,α1=11+m=2nμm,αj=μj1+m=2nμm,j=2,3,,n,μj=exp(xjβ),j=2,,nβ~N(0,τ2Ip), (3)

where πi = (πi1, πi2, …, πin), αi is the i-th element of the n × 1 vector α, where the αi’s sum to one, and each xj is a p × l vector of covariates.

Since the support of the Dirichlet distribution is a simplex, the matrix with πi is row-stochastic. So, the posterior distribution for the πij define the posterior distribution of the PageRank transition matrix. Computing the stationary distribution for each posterior sample of this row-stochastic matrix will return a sample of the posterior distribution for the PageRank vector. The posterior samples for the i-th entry of these vectors gives the posterior distribution of the PageRank for the i-th node. Ranks assigned according to this algorithm will be referred to as Dirichlet-1-Ranks for the rest of this paper.

A generalization allows the prior on πi to vary across i:

Yij|πij~indBern (πij),i,j=1,2,,n,πi|αi~indDirichlet (αi),i=1,2,,n,αi1=11+m=2nμim,i=1,2,,n,αij=μij1+m=2nμim,j=2,,n,μij=exp(xijβ),i=1,2,n;j=2,,n. (4)

4. RESULTS

For our data we will consider the following covariates,

X1i={1 if vertex i,the "sender", is from the UK 0 if i is from the US 
X2j={1 if vertex j,the "recipient", is from the UK 0 if j is from the US .

We fit five specific models. The first three models use the likelihood in (1). Model 1 specifies xij=(1,X1i,X2j), so p = 3, and specifies a vague Gaussian prior β ~ N(0, τ2I3), where β = {β01,β2). Model 2 uses the same xij as Model 1, but uses the Normal-Wishart specification for {β, Σ } as described in Section 3.2.1. Model 3 uses the same Normal-Wishart prior for θ = {β, Σ } as in Model 2, but adds an interaction term to the regressors, so xij=(1,X1i,X2j,X1iX2j) and p = 4 with β = (β0, β1, β2, β3). Models 4 and 5 correspond to (3) and (4), respectively. In Model 4, we take xj=(1,X3j), where

X3j={1 if site k is from the U.K. 0 if site k is from the U.S. ,

and in Model 5 we take xij to be the same as for Model 3.

The results from models 1, 2, and 3 are summarized in Tables 3, 4, and 5, respectively. Since the main slope coefficients have 95% Bayesian credible intervals with 2.5th and 97.5th quantiles of the posterior distributions as endpoints (subsequently referred to as CI) that are wholly below zero in all three models, the network seems to imply that the UK websites were less likely than US sites to both “send” and “receive” a hyperlink to and from another site. In fact, re-examining the adjacency matrix (Table 1), the UK sites are generally fairly isolated. The UK-National site has links to all the other UK sites, and to the US-National Site. Both of the national sites receive links from most of the UK sites. But, outside of the national websites, the only hyperlink is from UCL to CMU, which is a link from a UK site to a US site, or a transnational link.

Table 3.

Posterior summaries for Model 1

Coeffcient Mean SD 95% CI
β0 −0.20 0.20 [−0.59, 0.20]
β1 −1.14 0.31 [−1.75, −0.55]
β2 −1.68 0.33 [−2.36, −1.04]

Table 4.

Posterior summaries for Model 2

Coeffcient Mean SD 95% CI
β0 −0.27 0.19 [−0.65, 0.19]
β1 −1.04 0.29 [−1.61, −0.48]
β2 −1.50 0.31 [−2.15, −0.95]
σ02 1.03 3.09 [0.11, 5.31]
σ12 1.93 4.87 [0.19, 9.95]
σ22 3.15 14.45 [0.30, 15.64]
ρ01 0.14 0.50 [−0.83, 0.92]
ρ02 0.18 0.50 [−0.82, 0.92]
ρ12 0.52 0.41 [−0.54, 0.97]

Table 5.

Posterior summaries for Model 3

Coeffcient Mean SD 95% CI
β0 0.14 0.20 [−0.24, 0.53]
β1 −2.41 0.43 [−3.30, −1.61]
β2 −3.97 0.88 [−6.02, −2.54]
β3 4.76 1.07 [3.00, 7.22]
σ02 0.98 3.13 [0.11, 4.77]
σ12 6.84 37.80 [0.64, 32.06]
σ22 16.48 60.53 [1.45, 82.03]
σ32 23.48 85.51 [1.99, 120.05]
ρ01 −0.12 0.51 [−0.92, 0.84]
ρ02 −0.12 0.51 [−0.92, 0.84]
ρ03 0.12 0.51 [−0.84, 0.92]
ρ12 0.85 0.19 [0.29, 0.99]
ρ13 −0.86 0.19 [−1.00, −0.33]
ρ23 −0.92 0.12 [−1.00, −0.62]

While none of the correlations between the slope parameters in Model 2 were deemed significantly different from zero (the credible intervals all included 0), there were statistically significant correlations between all pairs of the slope coefficients (β1, β2, and β3) in Model 3. For the latter model, the interaction slope coefficient, β3, had a 95% CI that was wholly above zero, which indicates significantly higher probabilities of connections between nodes within a country than between countries (described as lack of transnational connections in [21]). We also note that while modest posterior shrinkage is observed, most of the 95% CI’s for the correlations in Table 4 are somewhat wide and there is no significant correlation between the global intercept and either of the two main effect slope parameters. We see greater shrinkage in Table 5, clearly brought about by the introduction of the interaction parameter. Here, the correlations between the interaction effects and the two main effect slopes are significant, but not with the global intercept; the two main effect slopes continue to be essentially uncorrelated with each other and the global intercept.

A model’s fit was assessed using the Watanabe-Akaike Information Criterion, or Widely Applicable Information Criterion (WAIC) [see, e.g., 12, 29]. We used the definition

WAIC=2×( lppdpwaic), (5)

where Ippd is the log pointwise predictive density, which is the log of the pointwise likelihood averaged over the posterior distribution of the model parameters, and summed over all the data points. This yields a measure of predictive accuracy and is computed as

i,j=1nlogEθ|Y[p(Yi,j|θ)]i,j=1nlog(1Ss=1Sp(Yij|θ(s))), (6)

where θ(s) for s = 1,2, …, S denote posterior samples of model parameters, and pwaic is a measure of model complexity given by the sum of the posterior variance of the log predictive density for each observed data point Yij. This is computed as

pwaic =i,j=1n varpost(logp(Yij|θ))i,j=1n varsample({logp(Yij|θ(s)):s=1,2,S}) (7)

where varsample({logp(Yij|θ(s)):s=1,2,S}) computes the sample variance of the quantities log p(Yij | θ(s))for s = 1, 2, …, S. Lower values for WAIC indicate a better-fit model. The WAIC values for models 1, 2, and 3 are 355.0, 315.1, and 269.5, respectively. So, model 3 was used for generating draws from the posterior predictive distribution for the subsequent analysis.

For plots indicating the medians and the 95% CI’s for the odds ratios comparing transnational connections against intra-national connections, see Figure 2. Note that these intervals are based only on Model 3 and that the reference groups are both types of intra-national connections (US-to-US and UK-to-UK).

Figure 2.

Figure 2.

Plot of the medians and 95% CI’s for various odds ratios; the first two intervals are compared against a reference group of UK-to-UK and the second two are compared against a reference group of US-to-US.

The medians and the 95% posterior predictive credible intervals based for the network analysis measures described in Sections 3.1.2 and 3.1.4 are presented in Tables 6 and 7, respectively. Here, the network was replicated using draws from (2) and the measures were computed for each sampled network as described in Section 3.2.2. This generated posterior predictive samples for each of the network analysis measures from which we computed point estimates and the 95% posterior predictive credible intervals.

Table 6.

Medians and 95% CI’s for the vertex centrality measures

Vertex Outdegree Indegree Freeman Betweenness Closeness Harary E-center
1 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.3 [0.0, 65.5] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
2 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.5 [0.0, 64.2] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
3 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.3 [0.0, 65.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
4 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.9 [0.0, 64.5] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
5 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.7 [0.0, 64.8] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
6 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.4 [0.0, 64.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
7 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.3 [0.0, 63.1] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
8 2 [0, 6] 1 [0, 4] 4 [1, 8] 5.5 [0.0, 64.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.1 [0.0, 0.3]
9 5 [2, 8] 6 [2, 9] 11 [6, 15] 13.0 [1.3, 63.5] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
10 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.1 [1.4, 64.7] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
11 5 [2, 8] 6 [2, 9] 11 [6, 15] 13.0 [1.2, 63.7] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
12 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.0 [1.3, 65.3] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
13 5 [2, 8] 6 [2, 9] 11 [6, 15] 13.2 [1.3, 65.1] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
14 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.3 [1.3, 63.1] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
15 5 [2, 8] 6 [2, 9] 11 [6, 15] 12.9 [1.2, 64.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.5] 0.3 [0.1, 0.4]
16 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.1 [1.4, 65.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
17 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.0 [1.3, 67.0] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
18 5 [2, 8] 6 [2, 9] 11 [6, 16] 13.3 [1.3, 65.4] 0.0 [0.0, 0.5] 0.0 [0.0, 0.3] 0.3 [0.1, 0.4]
Centralization 0.2 [0.1, 0.3] 0.3 [0.2, 0.4] 0.2 [0.1, 0.3] 0.1 [0.05, 0.3] 0.07 [0.0, 0.6] 0.02 [0.0, 0.4] 0.2 [0.1, 0.3]

Table 7.

Medians and 95% CI’s for the Krackhardt’s axiomatic analysis measures

Measures from Krackhardt’s axioms 95% CI
Connectedness 1.00 [0.79, 1.00]
Effciency 0.82 [0.75, 0.88]
Hierarchy 0.33 [0.00, 0.68]
LUBness 1.00 [0.11, 1.00]

The point estimates of the vertex analysis measures in Table 6 seem to be consistent with earlier findings reported in [21], hence not reproduced here, but quantify uncertainty which will enable social scientists to attach confidence to their conclusions. In general, there was much overlap between all the credible intervals for a specific vertex measure, especially the intervals for betweenness, closeness, Harary centrality, and eigenvector centrality. The structure statistics table indicates that at least half of all possible site-pairs have a geodesic distance of 4 or less.

Looking at the table for Kraekhardt’s axiomatic analysis in Table 7, the network appears to be very connected, since the 95% Cl for connectedness is (0.79,1). The network also seems to be fairly efficient, with an interval of (0.75, 0.88); Krackhardt’s measures range from 0 to 1. The point estimates appear consistent with what was concluded from the structure statistics table reported in [21]. The network does not seem to be too hierarchical, with a CI of (0.00, 0.68) for that measure. The CI for LUBness indicates a lack of information from the network on that property, since it spans almost the entire range of possible values for LUBness.

Finally, the summaries of the posterior distributions for β in Models 4 and 5 are seen in Tables 8 and 9, respectively. Here, it is worth pointing out that the interpretations of the regression coefficients are different from those in Models 1–3. In fact, the purpose of incorporating the covariates in the μ’s in Models 4 and 5 are solely to allow the Dirichlet distribution to vary by the nodes, thereby adding flexibility. Our main inferential goal concerns the ranks. The means, standard deviations, and 95% CI’s for the Dirichlet-1-Hanks are presented in Figure 3. A summary plot for the Dirichlet-2-Ranks is presented in Figure 4. For the most part, these two plots are similar. The posterior medians seem to be more or less consistent with the PageRank measures in Table 2, but the inherent uncertainties gleaned here (but not from the deterministic PageRank algorithm) should add caution to making substantial conclusions regarding these rankings.

Table 8.

Posterior summaries for Model 4

Coeffcient Mean SD 95% CI
β0 0.85 0.38 [0.10, 1.56]
β1 −1.08 0.22 [−1.46, −0.69]

Table 9.

Posterior summaries for Model 5

Coeffcient Mean SD 95% CI
β0 11.39 92.12 [−163.71, 188.39]
β1 14.02 94.39 [−173.11, 198.68]
β2 −0.51 0.16 [−0.82, −0.19]
β3 0.61 0.34 [−0.13, 1.19]

Figure 3.

Figure 3.

Plot of the medians and 95% CI’s for the Dirichlet-1-Ranks for the 18 sites. Blue bars indicate UK sites and red bars indicate US sites.

Figure 4.

Figure 4.

Plot of the medians and 95% CI’s for the Dirichlet-2-Ranks for the 18 sites. Blue bars indicate UK sites and red bars indicate US sites.

While there is still overlap between most of the 95% CI’s seen here, there is, as in the other results, a split between the US and the UK sites. The UK sites have median estimates that are consistently below those of the US sites. There is also a spike in mean estimates for sites 3 and 14, the two national sites, which was not seen in the other statistical results seen here. It is, however, noteworthy that there is no overlap between the CI for the US-National site and any of the UK sites, except for the UK-National website. These last two facts demonstrate that both of the Dirichlet-Ranks appear to be more sensitive or tuned than the other statistical approaches presented in this paper.

5. CONCLUSION

This paper has presented some parametric Bayesian methods for structural network analysis of small networks with limited information. The intended contribution is to offer sociologists some easily implementable Bayesian models that can offer uncertainty quantification while carrying out descriptive SNA. Basic descriptive measures of directed graphs, PageRank algorithms and Krackhardt’s axiomatic analysis are considered in conjunction with parametric Bayesian hierarchical models. The Bayesian paradigm is particularly attractive as we do not need to rely upon asymptotic validation of our inference based on the numbers of vertices (or edges) becoming larger and larger. The methods are easily implemented using R packages sna and rjags. Bayesian inference provides new insights into a rather structurally sparse network of Hindu student groups, previously studied without uncertainty quantification, to better understand transnational identities.

We clarify that our treatment here is by no means exhaustive. For example, there are numerous other structural quantities that could be added to the measures we considered here in Section 3 as described, for example, in [3, 5, 20, 25, 26, 27]. Our choice of the measures and models investigated here has been dictated to a substantial extent by their accessibility to practicing social scientists in the form of R packages. Investigations with these networks can be part of future studies. We can also easily incorporate latent random effects in the mean structures for our models. This can be done for each of Models 1–5 here. We have deliberately not attempted to fit such, more complex models here, given that the random effects would not be able to learn much from the small network. However, this is central to the objectives of this paper where we have focused upon methods that work well for small networks, and will be equally applicable to large networks as well.

In addition, there are much richer stochastic models such as exponential random graph models and Erdos-Renyi models [see, e.g., 4, 23]. The logistic regression models described here can be looked upon as heterogeneous versions of the Erdos-Renyi models where the probability of a directed edge between two nodes is not uniform but depends upon attributes (covariates) associated with the sender and receiver. There are other models, such as the Barbási-Albert models [22], that are generative and can simulate artificial networks given certain characteristics, but it is less clear how to carry out model-based Bayesian inference from observed networks, as we do here, with these models. We identify these as possible areas for future research.

Supplementary Material

Appendix

Footnotes

6.

SUPPLEMENTARY MATERIAL

Supplement to the paper (http://intlpress.com/site/pub/pages/journals/items/sii/content/vols/0012/0001/s004) comprises the R programs used for the analysis.

*

We thank the Editors, the Associate Editor and Referees for their suggestions that improved the article considerably. The third author’s work was supported, in part, by grants NIH/NIEHS 1R01ES027027-01, NSF IIS-1562303 and NSF DMS-1513654.

Contributor Information

Thomas Nemmers, Lockton Companies, LLC, 4275 Executive Square, Suite 600, La Jolla, CA 92037, USA.

Anjana Narayan, Department of Psychology and Sociology, California State Polytechnic University, 3801 West Temple Avenue, Pomona, CA 91768, USA.

Sudipto Banerjee, UCLA Department of Biostatistics, 650 Charles E. Young Dr. South, Los Angeles, CA 90095, USA.

REFERENCES

  • [1].Banerjee Sudipto and Roy Anindya. (2014). Linear Algebra and Matrix Analysis for Statistics, 1st ed. CRC Press, Boca Raton, FL. [Google Scholar]
  • [2].Bedrick EJ, Christensen R and Johnson W (1996). A new perspective on priors for generalized linear models. Journal of the American Statistical Association, 91, 1450–1460. [Google Scholar]
  • [3].Boccaletti S, Latora V, Moreno Y, Chavez M and Hwang D-U (2006). Complex networks: Structures and dynamics. Physics Reports, 424, 175–308. [Google Scholar]
  • [4].Bollobás B (2001). Random Graphs Second Edition Cambridge University Press: Cambridge UK. [Google Scholar]
  • [5].Borgatti SP (2005). Centrality and network flow. Social Networks, 27, 55–71. [Google Scholar]
  • [6].Brin Sergey and Page Lawrence. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems. 30 107–117. [Google Scholar]
  • [7].Chen M-H and Shao Q-M (2000). Propriety of posterior distribution for dichotomous quantal response models. Proceedings of the American Mathematical Society, 129, 293–302. [Google Scholar]
  • [8].Chikuse Y (2003). Statistics on Special Manifolds vol. 174, Lecture Notes in Statistics. Springer-Verlag, New York: ISBN 0-387-00160-3. [Google Scholar]
  • [9].Freeman LC (1977). A set of measures of centrality based on betweenness. Sociometry 40 35–41. [Google Scholar]
  • [10].Freeman LC (1979). Centrality in social networks I: Conceptual clarification. Social Networks 1 215–239. [Google Scholar]
  • [11].Gelman A, Jakulin A, Pittau MG and Su Y-S (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2, 1360–1383. [Google Scholar]
  • [12].Gelman Andrew, Carlin John B., Stern Hal S., Dunson David B., Vehtari Aki, and Rubin Donald B. (2013). Bayesian Data Analysis, 3rd ed. CRC Press, Boca Raton, FL. [Google Scholar]
  • [13].Costa L. da F., Rodrigues FA, Travieso G and Villas Boas PR (2007). Characterization of complex networks: A survey of measurements. Advances in Physics, 56, 167–242. [Google Scholar]
  • [14].Harville DD (1997). Matrix Algebra from a Statistician’s Perspective. Springer-Verlag, New York. [Google Scholar]
  • [15].Hoff PD, Raftery AE and Handcock MS (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97, 1090–1098. [Google Scholar]
  • [16].Ibrahim JG and Laud PW (1991). On Bayesian analysis of generalized linear models using Jeffrey’s prior. Journal of the American Statistical Association, 86, 981–986. [Google Scholar]
  • [17].Kemeny JG and Snell LJ (1976). Finite Markov Chains. New York: Springer-Verlag. [Google Scholar]
  • [18].Krackhardt D (1994). Graph theoretical dimensions of informal organizations In Carley Kathleen M. and Prietula Michael J. (Eds.), Computational Organization Theory pp. 89–111. Lawrence Erlbaum Associates, Hillsdale, NJ. [Google Scholar]
  • [19].Langville AN and Meyer CD (2006). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton, NJ: Princeton University Press. [Google Scholar]
  • [20].Newman MEJ (2010). Networks: An Introduction. Oxford University Press; Oxford, UK. [Google Scholar]
  • [21].Narayan A, Purkayastha B and Banerjee S (2011). Constructing transnational and virtual ethnic identities: A study of the discourse and networks of ethnic student organisations in the USA and UK. Journal of Intercultural Studies 32, 515–537. [Google Scholar]
  • [22].Albert R and Barabási A-L (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97. [Google Scholar]
  • [23].Erdős P and Rényi A (1959). On random graphs. I. Publicationes Mathematicae 6, 290–297. [Google Scholar]
  • [24].Robins G, Snijders T, Wang P, Handcock M, and Pattison P (2007). Recent developments in exponential random graph (p*) models for social networks. Social Networks, 29, 192–215. [Google Scholar]
  • [25].Scott J and Carrington PJ (2011). The SAGE Handbook of Social Network Analysis. SAGE Publications Ltd; London, UK. [Google Scholar]
  • [26].Victor JN, Montgomery AH and Lubell M (2018). The Oxford Handbook of Political Networks. Oxford University Press, New York. [Google Scholar]
  • [27].Vigna S and Boldi P (2014). Axioms for centrality. Internet Mathematics, 10, 222–262. [Google Scholar]
  • [28].Wasserman S and Frost K (1994). Social Network Analysis: Methods and Applications. Cambridge, New York. [Google Scholar]
  • [29].Watanabe S (2013). A widely applicable Bayesian information criterion. Journal of Machine Learning Research, 14, 867–897. [Google Scholar]
  • [30].Zellner A and Rossi PE (1984). Bayesian analysis of dichotomous quantal response models. Journal of Econometrics, 25, 365–393. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES