Skip to main content
Science Advances logoLink to Science Advances
. 2024 May 3;10(18):eadj0104. doi: 10.1126/sciadv.adj0104

Proper network randomization is key to assessing social balance

Bingjie Hao 1, István A Kovács 1,2,3,*
PMCID: PMC11068007  PMID: 38701217

Abstract

Social ties, either positive or negative, lead to signed network patterns, the subject of balance theory. For example, strong balance introduces cycles with even numbers of negative edges. The statistical significance of such patterns is routinely assessed by comparisons to null models. Yet, results in signed networks remain controversial. Here, we show that even if a network exhibits strong balance by construction, current null models can fail to identify it. Our results indicate that matching the signed degree preferences of the nodes is a critical step and so is the preservation of network topology in the null model. As a solution, we propose the STP null model, which integrates both constraints within a maximum entropy framework. STP randomization leads to qualitatively different results, with most social networks consistently demonstrating strong balance in three- and four-node patterns. On the basis our results, we present a potential wiring mechanism behind the observed signed patterns and outline further applications of STP randomization.


The underlying principles of social networks are found through a network randomization method.

INTRODUCTION

Individuals within society can be viewed as nodes in a social network, with edges representing various relationships between them. These relationships are highly diverse in their nature and can often be expressed in either positive (friend/trust) or negative (foe/distrust) terms (1), leading to signed social networks, with varying degrees of polarization (2, 3). Quantifying the abundance of network patterns is the first step toward understanding why certain connections are formed and not others, captured by the underlying wiring mechanisms, as well as toward understanding and potentially reducing polarization in social media (2, 48). As a key concept, network graphlets (and motifs) (9, 10) are patterns of connections that occur significantly more frequently than in a null model, which is a suitably randomized version of the empirical data (9). Graphlets, also known as induced subgraphs (11), specify the existence and sign of every edge within a subset of nodes. In contrast, motifs (or noninduced subgraphs) (9), specify only the required edges, allowing for the presence or absence of other edges. For instance, in an undirected signed network, a graphlet consisting of three nodes connected by two edges indicates the absence of the third edge, while a motif detects instances both with or without the third edge. Note that any fully connected graphlet can be equivalently referred to as a motif.

Seminal studies have shown that network graphlets and motifs play an important role in understanding the organization, functionality, and hidden mechanisms behind many complex systems, from social networks to brain connectivity and protein-protein interaction networks (917). Fully connected “triangle” graphlets of three nodes are particularly informative on tie formation mechanisms between acquaintances of the same node. As a starting point, strong balance (SB) (18) captures the intuitive notions of “the friend of my friend is my friend,” “the enemy of my friend is my enemy,” and “the enemy of my enemy is my friend.” All these examples correspond to balanced cycles (a path that starts and ends at the same node) of length three, where the product of edge signs along the cycle is positive. The notion of SB has been extended to cycles of any length, stating that a network is maximally balanced if all cycles are balanced (18). In practice, there are often deviations from maximal balance (19), requiring the statistical analysis of the enrichment of the studied patterns versus a null model. A null model of statistical power is a randomized network that is as close to the real network as possible without capturing the actual wiring mechanisms. Although it is generally believed that social networks tend to be in somewhat balanced states (20, 21), the conclusions about balance strongly depend on the studied datasets and the chosen null model (2225).

As a basic example, the “rewire” null model (23) swaps edges between nodes while preserving the node degree (k, number of neighbors), leading to networks with disrupted topology. Hence, the conclusions based on the rewire null model mix the pattern formation mechanisms arising from edge signs with those of purely topological origin. For example, an overrepresentation of certain patterns might stem solely from the observation that those patterns have a low probability of forming at a purely topological level, regardless of the edge signs.

Here, we aim to disentangle purely topological effects from mechanisms of balance related to edge signs by fixing the topology while randomizing the edge signs. As a realization, a more commonly used null model that preserves the network topology is the “sign shuffle” null model (22). In this null model, the total number of positive and negative edges is exactly preserved, while the sign is randomly assigned to each edge. Note that the sign shuffle null model has the limitation that all nodes are assumed to have the same expected ratio of positive edges. As illustrated in Fig. 1, this assumption is far from reality. In real-life networks, some nodes are more “friendly” (“hostile”) than others, i.e., holding mostly positive (negative) edges. Consequently, null models that neglect the signed node degree could yield biased conclusions regarding balance.

Fig. 1. Signed degree correlations.

Fig. 1.

The positive (k+) and negative (k) node degree correlation in (A) Slashdot, (B) Congress, (C) Bitcoin-Alpha, and Bitcoin-OTC (D) Epinions. The r values denote the Pearson correlation coefficient between k+ and k of each dataset, indicating a moderate correlation between k+ and k. The dashed line represents the linear fit.

To incorporate both insights, a null model that preserves both the network topology and signed node degree is needed. State-of-the-art null models only preserve one of these constraints (22, 23, 26). As a solution, we propose an alternative null model, a signed degree and topology preserving (STP) null model based on the maximum entropy framework (2731). The STP null model preserves the network topology exactly, while also matching the signed node degrees on average (see Materials and Methods).

We examine the signed network patterns on a collection of signed social networks covering datasets of various scales, including (i) Slashdot, a friend/foe network in the technological news site Slashdot (1); (ii) Congress, a political network where signed edges represent (un/)favorable interactions between US congresspeople on the House floor in 2005 (32); (iii) Bitcoin-Alpha, a trust/distrust network of Bitcoin traders on the platform Bitcoin Alpha (33); (iv) Bitcoin-OTC, a trust/distrust network of Bitcoin traders on the platform Bitcoin OTC (34); and (v) Epinions, a trust/distrust network among users of the product review site Epinions (1). For an overview of these datasets, see Table 1. As a key motivation for our work, we observe that positive (k+) and negative (k) node degrees are at best moderately correlated in the studied networks (Fig. 1), indicating that null models that do not consider the signed degree as a confounding factor may lead to biased results. As a key result, we show that, apart from the STP null model, none of the studied null models detect SB even in a simple reference network (SB reference), which is explicitly designed to exhibit SB. In the example of large-scale signed social networks, we show that the STP null model changes the results qualitatively, leading to a consistent interpretation of signed patterns. We conclude by discussing potential underlying pattern formation mechanisms behind our observations, as well as further applications and extensions of STP randomization.

Table 1. Overview of studied networks.

Dataset Slashdot Congress Bitcoin-Alpha Bitcoin-OTC Epinions SB ref EC ref
Nodes 82,052 219 3766 5857 119,070 120,000 120,000
Edges 498,527 520 13,872 21,131 701,569 547,868 790,591
Density 0.00015 0.02178 0.00196 0.00123 0.00010 0.00008 0.00011
Positive ratio 0.76411 0.79615 0.91703 0.86271 0.83215 0.82187 0.72319

RESULTS

Signed null models

To investigate how the topology and signed degree affect the graphlet statistics, we consider four null models for signed networks, see Fig. 2. In addition to the commonly used rewire and sign shuffle null models and our STP null model, we also consider the “signed rewire” null model (35). The signed rewire null model preserves the signed node degree by rewiring the positive and negative subgraphs separately. As a result, it preserves the signed node degrees while disrupting the topology. In Fig. 2, we illustrate the studied null models on a toy network satisfying SB. This toy network contains two groups of nodes (indicated by different node colors), with positive edges among group members and negative edges between the groups (36). Note that some nodes are more friendly (like node 0) or more hostile (like node 1), i.e., have a higher fraction of positive (or negative) edges than others.

Fig. 2. Overview of signed null models.

Fig. 2.

A small toy network that contains two groups of nodes (even in yellow and odd in gray) is shown in the middle. The network is designed to be strongly balanced with positive edges only between members of the same group and negative edges only between members of different groups. We consider four null models: (i) rewire, disrupts the topology and signed node degrees; (ii) sign shuffle, preserves the topology, disrupts signed node degrees; (iii) signed rewire, preserves signed node degrees, disrupts the topology; and (iv) STP, preserves both the topology and signed node degrees. Positive edges are shown in blue, while negatives are in red. Thicker lines indicate edges that are different from the original network.

Signed triangle patterns

To test social balance in real networks, we first consider the signed fully connected three-node graphlets, triangles, as illustrated in Fig. 3A for the Slashdot network. Each triangle graphlet is counted (nobs) and compared to the frequency distribution of such triangle (nrand) in the four null models. We first perform a normality test (37) for the null model frequencies of each graphlet nrand and achieve P > 0.05 for most cases. This implies the lack of substantial evidence to reject the null hypothesis that nrand conforms to a normal distribution (fig. S1). This is expected, as nrand is the sum of several almost independent random variables. Thus, we use the routinely applied z score as a measure to assess the enrichment or depletion of graphlets, computed as

z=nobs<nrand>σrand (1)

where <nrand> and σrand denote the mean and SD of nrand in 1000 random samples, respectively. To assess the significance of the results, we calculate the empirical P value in Figs. 3 and 4 (see Materials and Methods for details). We only interpret significant results with P < 0.01 and ∣z∣ > 2. In addition, we also calculate the fold change = nobs/<nrand> to indicate the relative abundance of the studied patterns.

Fig. 3. Signed triangles in the Slashdot network compared to different null models.

Fig. 3.

(A) The log2(fold change) is shown with gray dashed lines indicating a twofold increase or decrease. (B) z scores are shown in white if matching SB expectations and in black otherwise. The background of the z scores is blue for positive values and red for negative values. We list the balanced graphlets first, separated from the unbalanced graphlets by a black line. * marks significant results with ∣z∣ > 2 and P < 0.01. The statistical analysis is performed using a sample size of n = 1000.

In the Slashdot network, both the rewire and sign shuffle null models would conclude that only + + − triangles are underrepresented (Fig. 3A). This conclusion aligns with the notion of weak balance (WB) (1, 38, 39). WB relaxes the notion of balance so that only triangles with exactly one negative edge should be underrepresented. On the contrary, the signed rewire and STP null models identify that both − − − and + − − triangles are underrepresented, in line with SB. To gain more insight, we benchmark the performance of different null models by constructing a simple SB reference network (40) that is designed to exhibit SB (see details in Materials and Methods). As a clear limitation, the rewire and sign shuffle null models fail to detect SB even in the SB reference network (Fig. 4), as they mistakenly identify the + − − triangles as being underrepresented. This observation indicates the essential role of matching the heterogeneous signed degrees in the null models, narrowing down suitable null models to the signed rewire and STP models.

Fig. 4. Overview of the results for fully connected graphlets/motifs.

Fig. 4.

The z scores are indicated by blue (overrepresented) and red (underrepresented) blocks. We list the balanced graphlets first, separated from the unbalanced graphlets by a black line. We leave the block white if σrand = 0 as it leads to an undetermined z score. Significant results with both ∣z∣ > 2 and P < 0.01 are indicated by *. The statistical analysis is performed using a sample size of n = 1000.

With the STP null model, we observe significant SB in all studied datasets at the triangle level (Fig. 4). The results of the signed rewire null model are again consistent with SB, apart from the Epinions dataset, where the + + − pattern appears to be overrepresented. To underline the importance of reference datasets, the rewire and sign shuffle null models appear to be consistent with WB in all real networks, a disturbingly misleading result, as these null models fail to detect SB even in the SB reference dataset. To sum up, null models that disrupt the signed degree preferences lead to erroneous conclusions at the triangle level. This incoherence is further amplified when analyzing four-node graphlets, as discussed next.

Signed four-node patterns

Four-node graphlets are useful in uncovering higher-order structures and assessing network reliability across various fields (4143). However, as we will demonstrate, when analyzing social networks, comparing the observed frequencies of signed four-node graphlets to existing null models often yields inconsistent conclusions about structural balance across datasets. This again highlights the need for a proper null model that can help uncover the effect of balance in social networks. Just like for three-node graphlets, we start by considering fully connected four-node graphlets, squareX (patterns 5 to 15 in Fig. 4). We will then discuss four-node graphlets that are missing either one (squareZ) or both (square) diagonal edges. We define these graphlets to be balanced if all the cycles within the graphlet are balanced. Just like at the triangle level, the rewire and sign shuffle null models fail to detect SB at the squareX level even in the SB reference network (balanced patterns 5 to 6 appear to be underrepresented), rendering them unsuitable for our purposes (Fig. 4). In terms of the real-life datasets, the picture appears to be rather confusing, as each of the unbalanced squareX graphlets can be either significantly over- or underrepresented depending on the choice of network data and the null model (Fig. 4). In stark contrast, with the STP null model, the results are consistent with SB. The signed rewire null model again leads to inconsistent results across the datasets, as in addition to the Epinions dataset, Slashdot also appears to violate SB for multiple graphlets.

Since squareX graphlets are considered to be combinations of triangles, it is natural to expect that squareX graphlets are balanced if triangles are balanced. It then looks surprising that the signed rewire model fails to detect SB for squareX patterns in Slashdot, in contrast to the observed SB at the triangle level. This is an example that even for graphlets composed of triangles, the results do not simply follow from those for triangles. The reason is that significant SB at the triangle level still often comes with a considerable number of unbalanced triangles, contributing in a nonlinear way to the statistics of squareX and squareZ graphlets. Yet, four-node graphlets with fewer constraining edges may exhibit even less balance, calling for the analysis of squareZ (graphlets 16 to 29) and square (graphlets 30 to 35) patterns (Fig. 5). With the STP null model, squareZ graphlets again show consistency with SB (Fig. 5). Just like before, the rewire and sign shuffle null models fail for the SB reference network. At the same time, the signed rewire model deviates from SB only for the Epinions dataset.

Fig. 5. Overview of the results for four-node graphlets with some edges missing (squareZ and square).

Fig. 5.

The z scores are indicated by blue (overrepresented) and red (underrepresented) blocks. We list the balanced graphlets first, separated from the unbalanced graphlets by a black line. We leave the block white if σrand = 0 as it leads to an undetermined z score. Significant results with both ∣z∣ > 2 and P < 0.01 are indicated by *. The statistical analysis is performed using a sample size of n = 1000.

Square graphlets can provide additional information as they are not constrained by triangle statistics. In this case, the signed rewire null model also fails to detect SB in the SB reference network, in addition to the rewire and sign shuffle models. Specifically, the + + + + graphlet appears to be significantly underrepresented according to the signed rewire null model (Fig. 5), leaving the STP null model as the only viable null model.

Although there is no anticipation of SB at the square level from the triangle results, we observe significant SB in most of the studied datasets when compared to the STP null model (Fig. 5). The two potential exceptions are the financial datasets, namely, the significant underrepresentation of + − + − in Bitcoin-OTC and the depletion of − − − − in Bitcoin-Alpha, although this deviation is not significant. Note that most of the motif results align with the corresponding graphlet results for the STP null model (fig. S3). The only two exceptions are: − − − − in Epinions, which becomes significantly underrepresented, and + − + − in Bitcoin-OTC, which becomes significantly overrepresented. The observed cases of SB at the square level (with STP null model) call for an interpretation, independently from patterns at the triangle level. When network topology and node preferences are considered, square-level balance reveals additional balance not captured by triangle-level balance alone, as follows. (i) If two nodes have a shared enemy (friend), they may have more shared enemies (friends), corresponding to + + + + and − − − −; (ii) if two nodes have one shared enemy (friend), they may have more shared friends (enemies), corresponding to + + − −; and (iii) if two nodes have opposite attitudes toward a common neighbor, they may hold opposite attitudes toward more neighbors, corresponding to + − + −.

In addition, we have checked the performance of the null models on randomly rewired SB reference networks, where SB is intentionally reduced (fig. S2A). As expected, none of the null models detect SB or WB in this case for triangles and squares. However, the signed rewire null model generally detects larger z scores in these randomized networks than the STP null model. As an even more extreme test, we also considered reversing the signs of the SB reference network, leading to the network “SB rev” (fig. S2B). No method finds SB or WB in this case, and only the STP null model is consistent with the intended structure. Once again, we conclude that matching the topology and the signed degree sequence in the null model is important. Now that we have established the STP null model as the only suitable null model, we turn to discuss some of the potential wiring mechanisms behind our observations with STP.

Potential mechanisms behind signed patterns

As a starting point, ideas of node-copying mechanisms have been proposed to potentially explain network patterns, including the formation of square graphlets (4447). Here, we first generalize the node-copying mechanism to signed networks, where a new node can replicate (some or all) of the connections of another node, also copying the corresponding edge signs. As illustrated in Fig. 6A, when a node (A′) duplicates the edges along with their associated signs from another node (A), it naturally leads to some of the balanced squares: − − − −, + + − −, and + + + +. Note that the + − + − pattern cannot be created this way. This is in line with the Bitcoin-OTC dataset, where SB is detected apart from the underrepresented + − + − pattern. Yet, all other results in Fig. 5 show an overrepresentation of + − + − graphlets, calling for mechanisms that can potentially explain it. Moreover, signed triangle graphlets are not necessarily explained by a node-copying mechanism, depending also on the initial conditions.

Fig. 6. Illustration of signed copying mechanisms.

Fig. 6.

(A) Signed node copying. Node A′ copies the edges and signs from node A, forming balanced squares except the + − + − square. (B) Edge copying. Existing connected nodes copy each other’s edges with preserved signs (if positively connected) or reversed signs (if negatively connected). (C) Edge copying with node addition. New nodes are added to the network and copy all or some of the edges from a connected node following the edge-copying rules. This process eventually forms triangles, squares (including the + − + − square BCDEB), squareZs, and squareXs. Blue lines indicate positive edges and red lines indicate negative edges. Copied edges are shown in dashed lines.

Thus, as an alternative solution, we propose a simple edge-copying mechanism (Fig. 6B). Here, in each step, a node can copy all or some of the edges from its neighbor. Nodes connected by positive edges are assumed to copy each other’s attitudes toward other nodes, just like in the node-copying mechanism. A key difference is that negatively connected nodes are proposed to replicate the edges of their foes with signs reversed. The edge-copying mechanism initially leads to balanced triangles as shown in Fig. 6B and eventually leads to larger balanced graphlets (fig. S4). We implement the proposed edge-copying mechanism together with node addition to generate a simple edge-copying reference network, “EC ref” (see Materials and Methods). The EC reference network starts with a + + + triangle, with nodes sequentially added to the network (see examples in Fig. 6C and fig. S5), and eventually leads to all possible balanced patterns, including the + − + − square graphlet that is missing from the node-copying mechanism. This simple EC reference network only includes balanced patterns (see the proof in the Supplementary Materials) and thus all unbalanced patterns are underrepresented when compared to all null models.

DISCUSSION

Graphlet (or motif) statistics provide key insights into the mechanisms of network wiring and function. However, it is important to interpret the results in the context of an adequate null model. Up until now, signed network null models had a crucial shortcoming as they either ignored the signed degree preferences of the nodes or the network topology. First, we have shown that matching the signed degrees is critically important in heterogeneous signed networks. Second, we have shown that keeping the network topology intact is useful to disentangle purely topological effects from those related to the balance of signed patterns. As a solution, we proposed the STP null model that preserves both the signed degrees and the network topology using the maximum entropy framework. We found that the STP null model provides more consistent results across signed social networks than previous null models, favoring SB at both the level of triangles and four-node graphlets, with the potential exception of the + − + − graphlet for Bitcoin-OTC. The results suggest that, only when the network topology and node sign preferences are considered, real social networks exhibit a preference for balanced patterns while avoiding unbalanced patterns. This prevalent signature of SB is obscured when topology or signed node degrees are not properly incorporated into the null models. More broadly, the results also highlight the importance of the space of allowed connections. Assuming all connections are possible drastically changes interpretations from a fixed topology. In reality, the topology is neither completely arbitrary nor fixed. Gathering information on the structure of allowed connections would be useful for further understanding the wiring mechanisms of real-world social networks. The STP model could readily incorporate such data on the allowed connections as hard constraints.

In addition, we have introduced an edge-copying mechanism that has the potential to form balanced triangles and four-node graphlets, while providing flexibility in matching the (un)balanced square patterns. Note that the edge-copying mechanism provides a simple, yet plausible, example of forming balanced patterns at the levels of triangles, squareZ and squareX graphlets, questioning the current paradigm that ignores four-node graphlet mechanisms (36). Even so, without following the detailed dynamical processes of these networks, we cannot conclude that the edge-copying mechanism is actually at play in these networks. As a complication, when introducing a new node to the network, it can potentially engage in the formation of multiple graphlets simultaneously, both with and without participating in higher-order interactions (48, 49). Furthermore, both node-copying and edge-copying mechanisms may happen simultaneously in real social networks. Note that signed node copying is more plausible when a node can access information on the signed edges of other unrelated nodes, like in the Slashdot and Bitcoin-OTC datasets. Under other circumstances, for online social networks (50), individuals may have access to strangers’ friend lists but may have at most limited access to strangers’ blacklists (or foes). Therefore, we expect that the observed patterns might depend on factors like the feasibility and accessibility of copying strangers’ edges. Yet, the consistent SB observed in most datasets indicates a potentially widespread common mechanism, such as edge copying. Any exception detected in the square results may offer clues to understand key aspects of user behavior across platforms. For example, the underrepresentation of + − + − in the Bitcoin-OTC dataset might support the hypothesis that rather than reversing signs from distrustful users, users may decide not to copy those edges instead, leading to fewer + − + − squares.

Here, our primary narrative was to identify null models that are as close to the real data as possible without capturing the wiring mechanisms. As a key step, we proposed to disassociate the purely topological effects from those related to social balance. A substantial difference from a signed null model is then informative on the mechanisms related to balance. Note, however, that this is not the only narrative to consider. The presented null models are also valuable ingredients of an alternative framework where the aim is to match the mechanisms behind real data as closely as possible by the null models. In this sense, the rewire null model corresponds to the scenarios, where individuals can choose both whom they interact with and how. On the other hand, the sign shuffle null model reflects situations where individuals cannot choose whom they interact with but they can decide on how, as long as they have no heterogeneous sign preferences. In situations matching the signed rewire null model, individuals can choose whom to interact with, but with various tendencies of forming positive or negative edges. Last, the STP null model would capture scenarios where individuals cannot choose whom to interact with but they have inherently different tendencies toward forming either more positive or negative edges.

One limitation of this study is that the STP null model preserves the average signed node degree rather than the exact node degree. Consequently, it results in a “canonical ensemble” and is therefore expected to be statistically somewhat less powerful than “microcanonical” null models that preserve the signed node degree exactly (51). Although formulating a null model that precisely preserves signed degrees is expected to introduce a considerably more challenging combinatorial problem, it is a promising avenue that merits further investigation.

Another interesting point that would require further investigation is the appropriate statistical threshold (z score or P value). Especially in very large networks, the standard choice of ∣z∣ > 2 must be revisited as the tail of the distributions might deviate from a normal expectation. Going beyond triangle patterns also quickly increases the number of hypothesis tests, especially if one would consider five- or six-node patterns, potentially leading to spurious results, calling for a controlled family-wise error rate. Besides, while considering higher-order graphlets is a meaningful way to better decipher the complexity of social networks, limited by the substantially increased computational complexity, we only considered three- and four-node graphlets in this study. As an illustration of the computational complexity, within the Slashdot dataset, we encounter 571 million triangles, categorized into four distinct cases, along with 54 billion four-node graphlets that are classified into 31 distinct cases (graphlets 5 to 35 in Figs. 4 and 5).

STP randomization has widespread potential applications and extensions. To start, it can provide a more adequate alternative null model to quantify balance (24, 36, 5254) or measure polarization (2, 46, 8) in social networks. Besides, the observed SB in social networks, as indicated by the STP model, shows potential to advance current sign prediction methods (5559). In addition, the STP null model can be extended to directed and weighted networks (60, 61), with the potential to contrast large-scale data against alternatives to SB, such as status theory (1).

Note that upon completion of our manuscript, we came across a parallel study that overlaps with our work (62). The presented SCM-FT null model shares the same mathematical formulation as the STP null model, apart from details of the numerical implementation, achieving consistent results at the triangle level.

MATERIALS AND METHODS

Signed social network datasets

The four large signed social networks analyzed in this study were downloaded from the Stanford Large Network Dataset Collection (http://snap.stanford.edu/): (i) Bitcoin-Alpha, the trust/distrust network among people who trade Bitcoin on a platform called Bitcoin Alpha; (ii) Bitcoin-OTC, the trust/distrust network among people who trade Bitcoin on a platform called Bitcoin OTC (34); (iii) Slashdot, friend/foe network of the technological news site Slashdot released in February 2009; and (iv) Epinions, who-trust-whom online social network of a general consumer review site Epinions. The smaller-scale Congress network is from (32). More details of the construction of the datasets can be found in (1, 33). Network edges are considered to be undirected. This process leads to only a very limited number of edge sign inconsistencies. Such inconsistent edges are disregarded in our analysis, together with any self-loops (52). Only the largest connected component of each network is considered.

Construction of the SB reference network

We create a simple model network referred to as the SB reference network, following Harary’s theorem of SB (40). The model network includes 120,000 nodes, on par with the largest studied datasets. Note that the primary objective of the SB reference network is not to emulate actual social networks but rather, to serve as a standardized benchmark for evaluating whether null models can successfully identify SB within a simple SB reference network.

The nodes in the SB reference network are first divided into two equal groups. We then generate two degree sequences according to power-law degree distributions with an exponent of either 2 or 3 for positive and negative degree sequences, respectively. To introduce a moderate level of correlation between these positive and negative degree sequences, we initially arranged both sequences in ascending order. Subsequently, we exchange each degree value in the positive degree sequence with another random degree value from the same sequence, with a probability of 0.2 for each exchange. The resulting positive and negative degree sequences have a correlation coefficient of 0.4. Then, the negative degree sequence is used to generate negative edges between members of different groups using the configuration model, while the positive degree sequence is used to generate positive edges between members of the same groups. The resulting SB reference network has comparable density and positive edge ratios to real-life social networks as shown in Table 1.

Construction of the EC network

We use the edge-copying mechanism to introduce nodes into an initial network, thereby constructing a reference EC network. We used a + + + triangle as the initial condition and subsequently added nodes to the network. Each new node establishes a connection with a randomly selected node (45). The sign of this connection is positive with probability q = 0.9 to match the typical positive ratio of 0.75 to 0.92 in real networks (see Table 1). In addition, every new node connects with the neighbors of the selected node, with a probability P = 0.45 to match the sparsity of real networks. When a new node establishes a connection with the selected node via a positive (negative) edge, it keeps (reverses) the sign of the copied edge. The constructed EC reference network has 120,000 nodes, roughly the size of the largest studied real networks.

Apart from the EC reference network shown in the main text, we generated another two networks using different P and q values that lead to comparable density and positive ratios as real social networks (table S2). Figure S6 illustrates the degree distribution of the generated EC networks, which exhibits a rough alignment with a power-law distribution. Different EC networks yield consistent results regarding balance, as depicted in fig. S7.

Standard null models

In the rewire null model, we randomly select two edges of four different nodes, AB and CD, and attempt to swap them with equal probability to either AD, CB or AC, BD. Such swap attempts are aborted if the resulting edges already exist in the network. To achieve sufficient network randomization, we perform 40 E edge swap attempts, where E is the number of edges in the network. In the signed rewire null model, we use the same method as in the rewire null model but only select edges of the same sign, thus preserving the signed degrees of each node. To prevent multi-edges with both positive and negative signs, we prohibit the swap if it would result in edges with contradictory signs.

In the sign shuffle null model, we randomly assign positive or negative signs to each edge, while preserving the exact total number of positive and negative edges.

The STP null model

The STP null model is based on the maximum entropy framework (2731). We extend the application of the maximum entropy framework to signed networks, enabling the simultaneous preservation of both the network topology and the signed degree sequence. A signed network G0 is first divided into two subgraphs, namely, the positive (Gp) and negative (Gn) subgraphs that include all the positive or negative edges in G0. Keeping the topology intact means that randomizing the negative subgraph readily provides the positive subgraph. Each negative subgraph instance Gnr is assigned a probability P(Gnr) that maximizes the Shannon-Gibbs entropy

S=Gnr P(Gnr) lnP(Gnr) (2)

subject to the constraints ∑GnrP(Gnr) = 1 and the average negative node degree <ki(Gnr)>=ki(Gn) . Considering all constraints leads to the function

L=S+β[1Gnr P(Gnr)]+i θi[ki(Gnr)]Gnr P(Gnr)ki(Gnr) (3)

where β, θi are the Lagrange multipliers of the constraints. The solution is found by setting the derivatives of L with respect to P(Gnr) and the Lagrange multipliers to 0. Solving the equations leads to P(Gnr) = eH(Gnr)/Z, with the Hamiltonian H(Gnr)=i θiki(Gnr) and the partition function Z = ∑GnreH(Gnr). We define an element in the negative adjacency matrix as σij=1 if node i is negatively connected to node j, otherwise σij=0 . The Hamiltonian can then be expressed as

H=ij θiki=i<j (θi+θj)σij (4)

The partition function is then

Z={σij} exp[i<j (θi+θj)σij]=i<j [1+e(θi+θj)] (5)

The resulting probability of selecting an existing edge in G0 to be part of Gnr between nodes i and j is simply (6365)

pij=σij=Gnr σijP(Gnr)=11+eθi+θj=11+αiαj (6)

where we denote αi = eθi. αi can be found efficiently, through the iterations

αi=1kij,(i,j)G0 1αj+1/αi (7)

We use the initial condition αi(0)1 and we stop the iteration when the maximum relative change of αi is less than 10−3 between two consecutive iterations or it reaches the maximum number of iterations of our iterative algorithm, set to 104. Note that although here we randomize the negative subgraph and set the remaining network as the positive subgraph, randomizing the positive subgraph first will give the same result.

Empirical P values

We use empirical P values to assess the significance of the graphlet results. Specifically, we compute the one-sided empirical P value as P = (r + 1)/(n + 1), where n is the number of random samples, set to 1000 in this study, and r is the number of samples that produce a higher (lower) graphlet frequency than or equal to the observed frequency (66). A P value below 0.01 indicates that the observed graphlet frequency nobs is significantly higher (lower) than the average graphlet frequency <nrand> in the random samples.

Acknowledgments

We thank H. S. Ansell and A. Salova for comments and discussion.

Funding: The authors acknowledge that they received no funding in support of this research.

Author contributions: Conceptualization: B.H. and I.A.K. Methodology: B.H. and I.A.K. Investigation: B.H. Visualization: B.H. Supervision: I.A.K. Writing (all): B.H. and I.A.K.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data needed to evaluate the conclusions of the paper are present in the paper and/or the Supplementary Materials. The code and processed data can be accessed at https://zenodo.org/record/8428724.

Supplementary Materials

This PDF file includes:

Supplementary Text

Figs. S1 to S7

Tables S1 and S2

sciadv.adj0104_sm.pdf (730.7KB, pdf)

REFERENCES AND NOTES

  • 1.J. Leskovec, D. Huttenlocher, J. Kleinberg, Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI ‘10 (ACM Press, 2010), p. 1361. [Google Scholar]
  • 2.Z. Huang, A. Silva, A. Singh, Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (2022), pp. 390–400. [Google Scholar]
  • 3.Askarisichani O., Lane J. N., Bullo F., Friedkin N. E., Singh A. K., Uzzi B., Structural balance emerges and explains performance in risky decision-making. Nat. Commun. 10, 2648 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Garimella V. R. K., Weber I., A long-term analysis of polarization on Twitter. ICWSM 11, 528–531 (2017). [Google Scholar]
  • 5.N. Gillani, A. Yuan, M. Saveski, S. Vosoughi, D. Roy, Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ‘18 (ACM Press, 2018), pp. 823–831. [Google Scholar]
  • 6.J. A. Tucker, A. Guess, P. Barbera, C. Vaccari, A. Siegel, S. Sanovich, D. Stukal, B. Nyhan, Social Media, political polarization, and political disinformation: A review of the scientific literature. SSRN Journal (2018).
  • 7.B. Gross, S. Havlin, B. Barzel, Dense network motifs enhance dynamical stability (2023). Preprint: 10.48550/arXiv.2304.12044. [DOI]
  • 8.Rácz M. Z., Rigobon D. E., Towards consensus: Reducing polarization by perturbing social networks. IEEE Trans Netw Sci Eng 10, 3450–3464 (2023). [Google Scholar]
  • 9.Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U., Network motifs: Simple building blocks of complex networks. Science 298, 824–827 (2002). [DOI] [PubMed] [Google Scholar]
  • 10.Ahmed N. K., Neville J., Rossi R. A., Duffield N. G., Willke T. L., Graphlet decomposition: Framework, algorithms, and applications. Knowl. Inf. Syst. 50, 689–722 (2017). [Google Scholar]
  • 11.Pržulj N., Biological network comparison using graphlet degree distribution. Bioinformatics 23, e177–e183 (2007). [DOI] [PubMed] [Google Scholar]
  • 12.Shen-Orr S. S., Milo R., Mangan S., Alon U., Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68 (2002). [DOI] [PubMed] [Google Scholar]
  • 13.Tran N. H., Choi K. P., Zhang L., Counting motifs in the human interactome. Nat. Commun. 4, 2241 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.P. Li, O. Milenkovic, Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17 (Curran Associates Inc., 2017), pp. 2305–2315. [Google Scholar]
  • 15.J. Chen, W. Hsu, M. L. Lee, S.-K. Ng, Proceedings of the 12th ACM SIGKDD Internaional Conference on Knowledge Discovery and Data Mining, KDD ‘06 (Association for Computing Machinery, 2006), pp. 106–115. [Google Scholar]
  • 16.Sporns O., Kötter R., Motifs in brain networks. PLOS Biol. 2, e369 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.J. Ugander, L. Backstrom, J. Kleinberg, Proceedings of the 22nd International Conference on World Wide Web, WWW ‘13 (Association for Computing Machinery, 2013), pp. 1307–1318. [Google Scholar]
  • 18.Cartwright D., Harary F., Structural balance: A generalization of Heider’s theory. Psychol. Rev. 63, 277–293 (1956). [DOI] [PubMed] [Google Scholar]
  • 19.Ferreira E., Orbe S., Ascorbebeitia J., Álvarez Pereira B., Estrada E., Loss of structural balance in stock markets. Sci. Rep. 11, 12230 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.H. Situngkir, D. Khanafiah, Social Balance Theory: Revisiting Heider’s Balance Theory for many agents, Industrial Organization (2004).
  • 21.Pham T. M., Alexander A. C., Korbel J., Hanel R., Thurner S., Balance and fragmentation in societies with homophily and social balance. Sci. Rep. 11, 17188 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Szell M., Lambiotte R., Thurner S., Multirelational organization of large-scale social networks in an online world. Proc. Natl. Acad. Sci. U.S.A. 107, 13636–13641 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rao A. R., Jana R., Bandyopadhyay S., A Markov chain Monte Carlo method for generating random (0, 1)-matrices with given marginals. Sankhya Indian J. Stat. Series A 58, 225–242 (1996). [Google Scholar]
  • 24.Singh R., Adhikari B., Measuring the balance of signed networks and its application to sign prediction. J. Stat. Mech. 2017, 063302 (2017). [Google Scholar]
  • 25.Feng D., Altmeyer R., Stafford D., Christakis N. A., Zhou H. H., Testing for balance in social networks. J. Am. Stat. Assoc. 117, 156–174 (2022). [Google Scholar]
  • 26.A. L. Barabási, M. Pósfai, Network Science (Cambridge Univ. Press, 2016). [Google Scholar]
  • 27.Menichetti G., Remondini D., Panzarasa P., Mondragón R. J., Bianconi G., Weighted multiplex networks. PLOS ONE 9, e97857 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Becatti C., Caldarelli G., Saracco F., Entropy-based randomization of rating networks. Phys. Rev. E 99, 022306 (2019). [DOI] [PubMed] [Google Scholar]
  • 29.Squartini T., Garlaschelli D., Analytical maximum-likelihood method to detect patterns in real networks. New J. Phys. 13, 083001 (2011). [Google Scholar]
  • 30.Squartini T., Caldarelli G., Cimini G., Gabrielli A., Garlaschelli D., Reconstruction methods for networks: The case of economic and financial systems. Phys. Rep. 757, 1–47 (2018). [Google Scholar]
  • 31.Cimini G., Squartini T., Saracco F., Garlaschelli D., Gabrielli A., Caldarelli G., The statistical physics of real-world networks. Nat. Rev. Phys. 1, 58–71 (2019). [Google Scholar]
  • 32.M. Thomas, B. Pang, L. Lee, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP ‘06 (Association for Computational Linguistics, 2006), p. 327. [Google Scholar]
  • 33.S. Kumar, F. Spezzano, V. S. Subrahmanian, C. Faloutsos, 2016 IEEE 16th International Conference on Data Mining (ICDM) (IEEE, 2016), pp. 221–230. [Google Scholar]
  • 34.S. Kumar, B. Hooi, D. Makhija, M. Kumar, C. Faloutsos, V. S. Subrahmanian, Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (2018), pp. 333–341. [Google Scholar]
  • 35.Li A. W., Xiao J., Xu X.-K., Constructing refined null models for statistical analysis of signed networks. Chinese Phys. B 30, 038901 (2021). [Google Scholar]
  • 36.Kirkley A., Cantwell G. T., Newman M. E. J., Balance in signed networks. Phys. Rev. E 99, 012320 (2019). [DOI] [PubMed] [Google Scholar]
  • 37.D’Agostino R. B., An omnibus test of normality for moderate and large size samples. Biometrika 58, 341–348 (1971). [Google Scholar]
  • 38.Davis J. A., Clustering and structural balance in graphs. Hum. Relat. 20, 181–187 (1967). [Google Scholar]
  • 39.Esmailian P., Abtahi S. E., Jalili M., Mesoscopic analysis of online social networks: The role of negative ties. Phys. Rev. E 90, 042817 (2014). [DOI] [PubMed] [Google Scholar]
  • 40.Harary F., On the measurement of structural balance. Behav. Sci. 4, 316–323 (1959). [Google Scholar]
  • 41.Piraveenan M., Wimalawarne K., Kasthurirathn D., Centrality and composition of four-node motifs in metabolic networks. Procedia Comput. Sci. 18, 409–418 (2013). [Google Scholar]
  • 42.Ye S., Li Q., Mei G., Liu S., Pan L., How the four-nodes motifs work in heterogeneous node representation? A case study on aminer. World Wide Web 26, 1707–1729 (2023). [Google Scholar]
  • 43.Dey A. K., Gel Y. R., Poor H. V., What network motifs tell us about resilience and reliability of complex networks. Proc. Natl. Acad. Sci. U.S.A. 116, 19368–19373 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kovács I. A., Luck K., Spirohn K., Wang Y., Pollis C., Schlabach S., Bian W., Kim D. K., Kishore N., Hao T., Calderwood M. A., Vidal M., Barabási A. L., Network-based prediction of protein interactions. Nat. Commun. 10, 1240 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bhat U., Krapivsky P. L., Lambiotte R., Redner S., Densification and structural transitions in networks that grow by node copying. Phys. Rev. E 94, 062302 (2016). [DOI] [PubMed] [Google Scholar]
  • 46.J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, A. S. Tomkins, Computing and Combinatorics, G. Goos, J. Hartmanis, J. Van Leeuwen, T. Asano, H. Imai, D. T. Lee, S.-i. Nakano, T. Tokuyama, Eds. (Springer Berlin Heidelberg, 1999), vol. 1627, pp. 1–17. [Google Scholar]
  • 47.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal, Stochastic models for the Web graph, Proceedings 41st Annual Symposium on Foundations of Computer Science pp. 57–65 (2000).
  • 48.Civilini A., Anbarci N., Latora V., Evolutionary game model of group choice dilemmas on hypergraphs. Phys. Rev. Lett. 127, 268301 (2021). [DOI] [PubMed] [Google Scholar]
  • 49.Battiston F., Amico E., Barrat A., Bianconi G., Ferraz de Arruda G., Franceschiello B., Iacopini I., Kéfi S., Latora V., Moreno Y., Murray M. M., Peixoto T. P., Vaccarino F., Petri G., The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021). [Google Scholar]
  • 50.M. Cramer, J. Pang, Y. Zhang, Proceedings of the 20th ACM Symposium on Access Control Models and Technologies (ACM, 2015), pp. 75–86. [Google Scholar]
  • 51.Neal Z. P., Domagalski R., Sagan B., Comparing alternatives to the fixed degree sequence model for extracting the backbone of bipartite projections. Sci. Rep. 11, 23929 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Facchetti G., Iacono G., Altafini C., Computing global structural balance in large-scale signed social networks. Proc. Natl. Acad. Sci. U.S.A. 108, 20953–20958 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Estrada E., Benzi M., Walk-based measure of balance in signed networks: Detecting lack of balance in social networks. Phys. Rev. E 90, 042802 (2014). [DOI] [PubMed] [Google Scholar]
  • 54.Marvel S. A., Strogatz S. H., Kleinberg J. M., Energy landscape of social balance. Phys. Rev. Lett. 103, 198701 (2009). [DOI] [PubMed] [Google Scholar]
  • 55.J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks. Proceedings of the 19th International Conference on World Wide Web - WWW ‘10 (ACM Press, 2010), p. 641. [Google Scholar]
  • 56.Khodadadi A., Jalili M., Sign prediction in social networks based on tendency rate of equivalent micro-structures. Neurocomputing 257, 175–184 (2017). [Google Scholar]
  • 57.Liu S.-Y., Xiao J., Xu X.-K., Sign prediction by motif naive Bayes model in social networks. Inf. Sci. 541, 316–331 (2020). [Google Scholar]
  • 58.Liu S. Y., Xiao J., Xu X.-K., Improving sign prediction of network embedding by adding motif features. Physica A Stat. Mech. Appl. 593, 126966 (2022). [Google Scholar]
  • 59.Fenyves B. G., Szilágyi G. S., Vassy Z., Sőti C., Csermely P., Synaptic polarity and sign-balance prediction using gene expression data in the Caenorhabditis elegans chemical synapse neuronal connectome network. PLOS Comput. Biol. 16, e1007974 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Yao Q., Evans T. S., Christensen K., How the network properties of shareholders vary with investor type and country. PLOS ONE 14, e0220965 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Krawczyk M. J., Wołoszyn M., Gronek P., Kułakowski K., Mucha J., The Heider balance and the looking-glass self: Modelling dynamics of social relations. Sci. Rep. 9, 11202 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.A. Gallo, D. Garlaschelli, R. Lambiotte, F. Saracco, T. Squartini, Strong, weak or no balance? Testing structural hypotheses against real networks (2023). Preprint: 10.48550/arXiv.2303.07023. [DOI]
  • 63.Kovács I. A., Barabási D. L., Barabási A. L., Uncovering the genetic blueprint of the C. elegans nervous system. Proc. Natl. Acad. Sci. U.S.A. 117, 33570–33577 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Chatterjee A., Walters R., Shafi Z., Ahmed O. S., Sebek M., Gysi D., Yu R., Eliassi-Rad T., Barabási A. L., Menichetti G., Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat. Commun. 14, 1989 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Hao B., Kovács I. A., A positive statistical benchmark to assess network agreement. Nat. Commun. 14, 2988 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.North B., Curtis D., Sham P., A note on the calculation of empirical P values from Monte Carlo procedures. Am. J. Hum. Genet. 71, 439–441 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Text

Figs. S1 to S7

Tables S1 and S2

sciadv.adj0104_sm.pdf (730.7KB, pdf)

Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES