Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints

Nicolò Vallarano; Matteo Bruno; Emiliano Marchese; Giuseppe Trapani; Fabio Saracco; Giulio Cimini; Mario Zanon; Tiziano Squartini

doi:10.1038/s41598-021-93830-4

. 2021 Jul 27;11:15227. doi: 10.1038/s41598-021-93830-4

Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints

Nicolò Vallarano ¹, Matteo Bruno ¹, Emiliano Marchese ¹, Giuseppe Trapani ¹, Fabio Saracco ¹, Giulio Cimini ², Mario Zanon ¹, Tiziano Squartini ^1,^3,^✉

PMCID: PMC8316481 PMID: 34315920

Abstract

Exponential Random Graph Models (ERGMs) have gained increasing popularity over the years. Rooted into statistical physics, the ERGMs framework has been successfully employed for reconstructing networks, detecting statistically significant patterns in graphs, counting networked configurations with given properties. From a technical point of view, the ERGMs workflow is defined by two subsequent optimization steps: the first one concerns the maximization of Shannon entropy and leads to identify the functional form of the ensemble probability distribution that is maximally non-committal with respect to the missing information; the second one concerns the maximization of the likelihood function induced by this probability distribution and leads to its numerical determination. This second step translates into the resolution of a system of O(N) non-linear, coupled equations (with N being the total number of nodes of the network under analysis), a problem that is affected by three main issues, i.e. accuracy, speed and scalability. The present paper aims at addressing these problems by comparing the performance of three algorithms (i.e. Newton’s method, a quasi-Newton method and a recently-proposed fixed-point recipe) in solving several ERGMs, defined by binary and weighted constraints in both a directed and an undirected fashion. While Newton’s method performs best for relatively little networks, the fixed-point recipe is to be preferred when large configurations are considered, as it ensures convergence to the solution within seconds for networks with hundreds of thousands of nodes (e.g. the Internet, Bitcoin). We attach to the paper a Python code implementing the three aforementioned algorithms on all the ERGMs considered in the present work.

Subject terms: Complex networks, Statistical physics

Introduction

Over the last 20 years, network theory has emerged as a successful framework to address problems of scientific and societal relevance¹: examples of processes that are affected by the structural details of the underlying network are provided by the spreading of infectious diseases^2–4, opinion dynamics⁵, the propagation of losses during financial crises⁶, etc.

Within such a context, two needs have emerged quite naturally⁷: 1) detecting the topological properties of a real network that can be deemed as statistically significant - typically those higher-order properties (e.g. the assortativity, the clustering coefficient, etc.) that may be not explained by local features of the nodes such as the degrees, 2) inferring the relevant details of a given network structure in case only partial information is available. Both goals can be achieved by constructing a framework for defining benchmarks, i.e. synthetic configurations retaining only some of the properties of the original system - the so-called constraints - and, otherwise, being maximally random.

Two different kinds of approaches have been proposed so far, i.e. the microcanonical and the canonical ones. Microcanonical approaches^8–14 artificially generate many randomized variants of the observed network by enforcing the constraints in a ‘hard’ fashion, i.e. by creating configurations on each of which the constrained properties are identical to the empirical ones. On the other hand, canonical approaches^15–19 enforce constraints in a ‘soft’ fashion, by creating a set of configurations over which the constrained properties are on average identical to the empirical ones. Softening the requirement of matching the constraints has a clear advantage: allowing the mathematical expression for the probability of a generic configuration, $G$ , to be obtained analytically, as a function of the enforced constraints.

In this second case, a pivotal role is played by the formalism of the Exponential Random Graph Models (ERGMs)²⁰ whose popularity has steadily increased over the years. The ERGMs mathematical framework dates back to Gibbs’ (re)formulation of statistical mechanics and is based upon the variational principle known as maximum entropy, stating that the probability distribution that is maximally non-committal with respect to the missing information is the one maximizing the Shannon entropy²¹. This allows self-consistent inference to be made, by assuming maximal ignorance about the unknown degrees of freedom of the system.

In the context of network theory, the ERGMs framework has found a natural application to the resolution of the two aforementioned problems, i.e. 1) that of defining null models of the original network, in order to assess the significance of empirical patterns - against the hypothesis that the network structure is solely determined by the constraints themselves and 2) that of deriving the most probable configurations that are compatible with the available details about a specific network.

In both cases, after the functional form of the probability distribution has been identified, via the maximum entropy principle, one also needs to numerically determine it: to this aim, the likelihood maximization principle can be invoked, stating to maximize the probability of observing the actual configuration. This prescription typically translates into the resolution of a system of O(N) non-linear, coupled equations - with N representing the number of nodes of the network under analysis.

Problems like these are usually affected by the issues of accuracy, speed and scalability: the present paper aims at addressing them at once, by comparing the performance of three algorithms, i.e. Newton’s method, a quasi-Newton method and a recently-proposed fixed-point recipe^22,23, to solve a variety of ERGMs, defined by binary and weighted constraints in both a directed and an undirected fashion.

We would like to stress that, while the theoretical architecture of ERGMs is well established, an exhaustive study of the computational cost required for their numerical optimization, alongside with a comparison among the most popular algorithms designed for the task, is still missing. Our work aims at filling precisely this gap. Additionally, we provide a Python code implementing the three aforementioned recipes on all the ERGMs considered in the present work.

General theory

Canonical approaches aim at obtaining the mathematical expression for the probability of a generic configuration, $G$ , as a function of the observed constraints: ERGMs realize this by maximizing the Shannon entropy^15,16.

The Maximum Entropy Principle

Generally speaking, the problem to be solved in order to find the functional form of the probability distribution to be employed as a benchmark reads

\underset{P}{arg max} S [P]

\begin{matrix} s . t . & \sum_{G} P (G) C_{i} (G) = ⟨ C_{i} ⟩, i = 0 \dots M \end{matrix}

where Shannon entropy reads

\begin{matrix} S [P] = - \sum_{G} P (G) ln P (G) \end{matrix}

and $\vec{C} (G)$ is the vector of constraints representing the information defining the benchmark itself (notice that $C_{0} = ⟨ C_{0} ⟩ = 1$ sums up the normalization condition). The solution to the problem above can be found by maximizing the Lagrangian function

\begin{matrix} L (P, \vec{θ}) \equiv S [P] + \sum_{i = 0}^{M} θ_{i} [- \sum_{G} P (G) C_{i} (G) + ⟨ C_{i} ⟩] \end{matrix}

with respect to $P (G)$ . As a result one obtains

\begin{matrix} P (G | \vec{θ}) = \frac{e^{- H (G, \vec{θ})}}{Z (\vec{θ})} \end{matrix}

with $H (G, \vec{θ}) = \vec{θ} \cdot \vec{C} (G) = \sum_{i = 1}^{M} θ_{i} C_{i} (G)$ representing the Hamiltonian, i.e. the function summing up the proper, imposed constraints and $Z (\vec{θ}) = \sum_{G} P (G) = \sum_{G} e^{- H (G, \vec{θ})}$ representing the partition function, ensuring that $P (G)$ is properly normalized. Constraints play a pivotal role, either representing the information to filter, in order to assess the significance of certain quantities, or the only available one, in order to reconstruct the inaccessible details of a given configuration.

The maximum likelihood principle

The formalism above is perfectly general; however, it can be instantiated to study an empirical network configuration, say $G^{*}$ . In this case, the Lagrange multipliers ‘acting’ as unknown parameters in Eq. (4) can be numerically estimated by maximizing the associated likelihood function^15,24. The latter is defined as

\begin{matrix} L (\vec{θ}) \equiv ln P (G^{*} | \vec{θ}) = - H (G^{*}, \vec{θ}) - ln Z (\vec{θ}) \end{matrix}

and must be maximized with respect to the vector $\vec{θ}$ . Remarkably, whenever the probability distribution is exponential (as the one deriving from the Shannon entropy maximization), the likelihood maximization problem

\underset{\vec{θ}}{arg max} L (\vec{θ})

is characterized by first-order necessary conditions for optimality reading

\begin{matrix} \nabla_{θ_{i}} L (\vec{θ}) = \frac{\partial L (\vec{θ})}{\partial θ_{i}} = & - C_{i} (G^{*}) - \frac{\partial ln Z (\vec{θ})}{\partial θ_{i}} \\ = & - C_{i} (G^{*}) + \sum_{G} C_{i} (G) P (G) \\ = & - C_{i} (G^{*}) + ⟨ C_{i} ⟩ = 0, i = 1 \dots M \end{matrix}

and leading to the system of equations

\begin{matrix} \nabla L (\vec{θ}) = \vec{0} ⟹ \vec{C} (G^{*}) = ⟨ \vec{C} ⟩ \end{matrix}

to be solved. These conditions, however, are sufficient to characterize a maximum only if $L (\vec{θ})$ is concave. This is indeed the case, as we prove by noticing that

\begin{matrix} H_{ij} = & \nabla_{θ_{i} θ_{j}}^{2} L (\vec{θ}) = \frac{\partial^{2} L (\vec{θ})}{\partial θ_{i} \partial θ_{j}} = - \frac{\partial^{2} ln Z (\vec{θ})}{\partial θ_{i} \partial θ_{j}} \\ = & \frac{\partial ⟨ C_{j} ⟩}{\partial θ_{i}} = - Cov [C_{i}, C_{j}], i, j = 1 \dots M \end{matrix}

i.e. that the Hessian matrix, $H$ , of our likelihood function is ‘minus’ the covariance matrix of the constraints, hence negative semidefinite by definition. The fourth passage is an example of the well-known fluctuation-response relation²⁰.

A graphical representation of how the two principles work is shown in Fig. 1.

Graphical visualization of how the MEP and the MLP work: while the MEP allows the functional form of a probability distribution to be determined analytically, the MLP provides the recipe to numerically determine the parameters defining it.

Combining the MEP and the MLP principles

The Maximum Entropy Principle (MEP) and the Maximum Likelihood Principle (MLP) encode two different prescriptions aiming, respectively, at determining the functional form of a probability distribution and its numerical value. In optimization theory, the problem (the problem 1) is known as primal problem: upon noticing that the Shannon entropy is concave, while the imposed constraints are linear in $P (G)$ , one concludes that the primal problem is convex (it is easy to see this, by rewriting it as a minimization problem for $- S [P]$ ).

As convexity implies strong duality, we can, equivalently, consider an alternative version of the problem to optimize, know as dual problem. In order to define it, let us consider the Lagrangian function

\begin{matrix} L (P, \vec{θ}) \equiv S [P] + \sum_{i = 1}^{M} θ_{i} [- \sum_{G} P (G) C_{i} (G) + C_{i} (G^{*})] \end{matrix}

where, now, the generic expectation of the i-th constraint, $⟨ C_{i} ⟩$ , has been replaced by the corresponding empirical value, $C_{i} (G^{*})$ . As the dual function is given by

\begin{matrix} P (G^{*} | \vec{θ}) \equiv \underset{P}{arg max} L (P, \vec{θ}), \end{matrix}

the dual problem reads

\underset{\vec{θ}}{arg max} \underset{P}{arg min} - L (P (\vec{θ}), \vec{θ})

which is a convex problem by construction; this is readily seen by substituting Eq. (4) into Eq. (10), operation that leads to the expression

\begin{matrix} - L (P (\vec{θ}), \vec{θ}) = - \vec{θ} \cdot \vec{C} (G^{*}) - ln Z (\vec{θ}) = L (\vec{θ}), \end{matrix}

i.e. the likelihood function introduced in Eq. (5). In other words, Eq. (12) combines the MEP and the MLP into a unique optimization step whose score function becomes the Lagrangian function defined in Eq. (10).

Optimization algorithms for non-linear problems

In general, the optimization problem defined in Eq. (12) cannot be solved analytically, whence the need to resort to numerical methods. For an exhaustive review on numerical methods for optimization we refer the interested reader to^25,26: in the following, we present only the concepts that are of some relevance for us. The problemis a Nonlinear Programming Problem (NLP). In order to solve it numerically, we will adopt a Sequential Quadratic Programming (SQP) approach. Starting from an initial guess ${\vec{θ}}^{(0)}$ , SQP solves Eq. (14) by iteratively updating the vector of Lagrange multipliers

\begin{matrix} {\vec{θ}}_{i}^{(n + 1)} = {\vec{θ}}_{i}^{(n)} + α Δ {\vec{θ}}_{i}^{(n)}, i = 1 \dots M \end{matrix}

according to the rule

\begin{matrix} Δ {\vec{θ}}_{i}^{(n)} = {arg max}_{Δ {\vec{θ}}_{i}} [\nabla_{{\vec{θ}}_{i}} L (\vec{θ}) Δ {\vec{θ}}_{i} + \sum_{j, k} \frac{1}{2} Δ {\vec{θ}}_{j} H_{jk}^{(n)} Δ {\vec{θ}}_{k}] \end{matrix}

$\forall i$ , leading to the set of equations

\begin{matrix} \nabla_{i} L (\vec{θ}) + \sum_{j} H_{ij}^{(n)} Δ \vec{θ} = 0, i = 1 \dots M \end{matrix}

which can be compactly rewritten as

\begin{matrix} Δ {\vec{θ}}^{(n)} = - {H^{(n)}}^{- 1} \nabla L (\vec{θ}) . \end{matrix}

The stepsize $α \in (0, 1]$ is selected to ensure that $L ({\vec{θ}}^{(n + 1)}) > L ({\vec{θ}}^{(n)})$ via a backtracking, line search procedure: starting from $α = 1$ , if the Armijo condition

\begin{matrix} L ({\vec{θ}}^{(n)} + α Δ {\vec{θ}}^{(n)}) < L ({\vec{θ}}^{(n)}) + γ α \nabla L {(\vec{θ})}^{⊤} Δ \vec{θ}, \end{matrix}

is violated, we set $α \leftarrow β α$ ( $γ \in (0, 0.5]$ and $β \in (0, 1)$ are the parameters of the algorithm). On the other hand, the term $H^{(n)}$ can be selected according to a variety of methods. In the present contribution we focus on the following three ones.

Newton’s method. One speaks of Newton’s method in case $H^{(n)}$ is chosen to be

\begin{matrix} H^{(n)} = \nabla^{2} L ({\vec{θ}}^{(n)}) + Δ H^{(n)} \end{matrix}

where $\nabla^{2} L (\vec{θ})$ is the Hessian matrix of the likelihood function and the term $Δ H^{(n)}$ is typically selected as small as possible in order to avoid slowing convergence and to ensure that $H^{(n)}$ is negative definite (i.e. $\nabla^{2} L ({\vec{θ}}^{(n)}) + Δ H^{(n)} ≺ 0$ ). This choice of $H^{(n)}$ is also referred to as ‘exact Hessian’.

Quasi-Newton methods. Any Hessian approximation which is negative definite (i.e. satisfying $H^{(n)} ≺ 0$ ) yields an ascent direction and guarantees convergence. Although one may choose to consider the simplest prescription $H^{(n)} = - I$ , which yields the ‘steepest ascent’ algorithm, here we have opted for the following recipe, i.e. the purely diagonal version of Newton’s method: $H_{ii}^{(n)} = \nabla_{ii}^{2} L ({\vec{θ}}^{(n)}) + Δ H_{ii}^{(n)} < 0$ , $\forall i$ and $H_{ij}^{(n)} = 0$ , $\forall i \neq j$ .

Fixed-point iteration on modified KKT conditions. In addition to the (classes of) algorithms above, we will also consider an iterative recipe which is constructed as a fixed-point iteration on a modified version of the Karush-Kuhn-Tucker (KKT) conditions, i.e. $F (\vec{θ}) = \vec{0}$ or, analogously, $\vec{θ} = G (\vec{θ})$ ; the iterate can, then, be made explicit by rewriting the latter as

\begin{matrix} {\vec{θ}}^{(n)} = G ({\vec{θ}}^{(n - 1)}) . \end{matrix}

The condition above will be made explicit, for each network model, in the corresponding subsection. We also observe that this choice yields a non-standard SQP method as $H^{(n)}$ is typically not symmetric, for our models.

A note on convergence. Provided that the Hessian approximation $H^{(n)}$ is negative definite, the direction $Δ \vec{θ}$ is an ascent one; as such, it is guaranteed to yield an improvement of the objective function, for a step size $α$ that is sufficiently small. The role of the backtracking line search is that of finding a step size $α$ that yields such an improvement, while making sufficient progress towards the solution. As discussed in²⁵, Newton’s method has local quadratic convergence, while the quasi-Newton method and the fixed-point iteration algorithm have local linear convergence.

Applications

Let us now apply the algorithms described in the previous section to a number of specific cases of interest. The taxonomy of the models considered in the present paper is shown in Fig. 2 while the constraints defining each model are illustrated in Fig. 3.

System diagram illustrating the models discussed in the present paper and implemented in the ‘NEMTROPY’ package, attached to it: it represents a sort of guide to individuate the best model for analysing the system at hand. Our package handles both monopartite and bipartite networks; while the latter ones have been considered only in their binary, undirected fashion, the former ones can be modeled either in a binary or a weighted fashion, allowing for both undirected and directed links.

Graphical visualization of the constraints defining the ERGMs considered in this work. Notice that while the enhanced models (i.e. UECM and DECM) constrain binary and weighted quantities in a joint fashion, the conditional models (i.e. the CReM ones) allow for a ‘separate’ specification of them.

UBCM: binary undirected graphs with given degree sequence

Let us start by considering binary, undirected networks (BUNs). The simplest, non-trivial set of constraints is represented by the degrees of nodes: the degree of node i, i.e. $k_{i} (A) = \sum_{j (\neq i) = 1}^{N} a_{ij}$ , counts the number of its neighbours and coincides with the total number of 1s along the i-th row (or, equivalently, along the i-th column) of the adjacency matrix $A \equiv {a_{ij}}_{i, j = 1}^{N}$ . The benchmark defined by this set of constraints is known as Undirected Binary Configuration Model (UBCM) and its Hamiltonian reads

\begin{matrix} H_{UBCM} (A, \vec{θ}) = \sum_{i = 1}^{N} θ_{i} k_{i} (A) ; \end{matrix}

entropy maximization^15,16 leads to the factorized graph probability

\begin{matrix} P_{UBCM} (A | \vec{θ}) = \prod_{i = 1}^{N} \prod_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} p_{ij}^{a_{ij}} {(1 - p_{ij})}^{1 - a_{ij}} \end{matrix}

where $p_{ij} = p_{ij}^{UBCM} \equiv \frac{e^{- θ_{i} - θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}}$ . In this case, the canonical ensemble of BUNs is the set of networks with the same number of nodes, N, of the observed graph and a number of (undirected) links varying from zero to the maximum value $(\begin{matrix} N \\ 2 \end{matrix})$ . The argument of the problem (6) for the specific network $A^{*}$ becomes

\begin{matrix} L_{UBCM} (\vec{θ}) = - \sum_{i = 1}^{N} θ_{i} k_{i} (A^{*}) - \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} ln [1 + e^{- θ_{i} - θ_{j}}] \end{matrix}

whose first-order optimality conditions read

\begin{matrix} \nabla_{θ_{i}} L_{UBCM} = & - k_{i} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{e^{- θ_{i} - θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}} \\ = & - k_{i} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij}^{UBCM} \\ = & - k_{i} (A^{*}) + ⟨ k_{i} ⟩ = 0, i = 1 \dots N . \end{matrix}

Resolution of the UBCM. Newton’s and the quasi-Newton method can be easily implemented via the recipe defined in Eq. (18) (see “Appendix A” for the definition of the UBCM Hessian).

The explicit definition of the fixed-point recipe, instead, requires a preliminary observation, i.e. that the system of equations embodying the UBCM first-order optimality conditions can be re-written as follows

\begin{matrix} e^{- θ_{i}} = \frac{k_{i} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})}, i = 1 \dots N \end{matrix}

i.e. as a set of consistency equations. The observation that the term $e^{- θ_{i}}$ appears on both sides of the equation corresponding to the i-th constraint suggests an iterative recipe to solve such a system, i.e.

\begin{matrix} θ_{i}^{(n)} = - ln [\frac{k_{i} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{j}^{(n - 1)}}}{1 + e^{- θ_{i}^{(n - 1)} - θ_{j}^{(n - 1)}}})}], i = 1 \dots N \end{matrix}

originally proposed in²² and further refined in²³. The identification $p_{ij}^{UBCM} \equiv \frac{e^{- θ_{i}^{(\infty)} - θ_{j}^{(\infty)}}}{1 + e^{- θ_{i}^{(\infty)} - θ_{j}^{(\infty)}}}$ , $\forall i < j$ allows the probability coefficients defining the UBCM to be numerically determined.

As any other iterative recipe, the one proposed above needs to be initialized as well. To this aim, we have tested three different sets of initial values: the first one is defined by the position $θ_{i}^{(0)} = - ln [\frac{k_{i} (A^{*})}{\sqrt{2 L}}]$ , $\forall i$ - usually, a good approximation of the solution of the system of Eq. (25), in the ‘sparse case’ (i.e. whenever $p_{ij}^{UBCM} ≃ e^{- θ_{i} - θ_{j}}$ the second one is a variant of the position above, reading $θ_{i}^{(0)} = - ln [\frac{k_{i} (A^{*})}{\sqrt{N}}]$ , $\forall i$ ; the third one, instead, prescribes to randomly draw the value of each parameter from a uniform distribution with support on the unit interval, i.e. $θ_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ .

Reducing the dimensionality of the problem. The problem defining the UBCM can be further simplified by noticing that nodes with the same degree, say k, can be assigned the same value of the multiplier $θ$ ²⁴ - a result resting upon the observation that any value $k_{i} (A^{*})$ must match the sum of monotonic, increasing functions. This translates into the possibility of rewriting $L_{UBCM} (\vec{θ})$ in a ‘reduced’ fashion, as

\begin{matrix} L_{UBCM}^{reduced} (\vec{θ}) = & - \sum_{k} f (k) θ_{k} k (A^{*}) \\ - \sum_{k} \sum_{\begin{matrix} k^{'} \\ (k^{'} \leq k) \end{matrix}} f (k) [f (k^{'}) - δ_{k k^{'}}] \cdot \\ \cdot ln [1 + e^{- θ_{k} - θ_{k^{'}}}] \end{matrix}

where the sums run over the distinct values of the degrees and f(k) counts the number of nodes whose degree is k. Rewriting the problem with respect to the set ${θ_{k}}_{k}$ leads one to recover simplified versions of the three algorithms considered here: Newton’s and the quasi-Newton methods can, now, be solved via a ‘reduced’ version of eq. (18) (since both the dimension of the gradient and the order of the Hessian matrix of the likelihood function are, now, less than N), while the iterative recipe defined in (26) can be rewritten in terms of the ‘non-degenerate’ degrees, as

\begin{matrix} θ_{k}^{(n)} = - ln [\frac{k (A^{*})}{\sum_{k^{'}} [f (k^{'}) - δ_{k k^{'}}] (\frac{e^{- θ_{k^{'}}^{(n - 1)}}}{1 + e^{- θ_{k}^{(n - 1)} - θ_{k^{'}}^{(n - 1)}}})}] \end{matrix}

$\forall k$ , where, at the denominator, the self-contribution (i.e. the probability that a node links to itself) has been explicitly excluded.

Performance testing. The performance of the three algorithms, considered in the present paper, to solve the reduced version of Eq. (25), has been tested on a bunch of real-world networks. The latter ones span a wide variety of systems, including natural, financial and technological ones. In particular, we have considered the synaptic network of the worm C. Elegans²⁸, the network of the largest US airports²⁹, the protein-protein interaction network of the bacterium H. Pylori³⁰, Internet at the level of Autonomous Systems³¹ and eight daily snapshots of the so-called Bitcoin Lightning Network³², chosen throughout its entire history. Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out.

The accuracy of each algorithm in reproducing the constraints defining the UBCM has been quantified via the maximum absolute error metrics, defined, in a perfectly general fashion, as ${max}_{i} {|C_{i}^{*} - ⟨ C_{i} ⟩|}_{i = 1}^{N}$ (where $C_{i}^{*}$ is the empirical value of the i-th constraint, $C_{i}$ ). Naturally, in the UBCM case, $C_{i} = k_{i}$ , $\forall i$ and the aforementioned error score becomes

\begin{matrix} MADE = max_{i} {|k_{i}^{*} - ⟨ k_{i} ⟩|}_{i = 1}^{N} \end{matrix}

(the acronym standing for Maximum Absolute Degree Error). Equivalently, it is the infinite norm of the difference between the vector of the empirical values of the constraints and that of their expected values.

For each algorithm, we have considered three different stopping criteria: the first one puts a condition on the Euclidean norm of the gradient of the likelihood function, i.e.

\begin{matrix} | | \nabla L (\vec{θ}) {| |}_{2} = \sqrt{\sum_{i = 1}^{N} {(\nabla_{i}, L, (\vec{θ}))}^{2}} \leq 10^{- 8} ; \end{matrix}

the second one puts a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e.

\begin{matrix} | | Δ \vec{θ} {| |}_{2} = \sqrt{\sum_{i = 1}^{N} {(Δ, θ_{i})}^{2}} \leq 10^{- 8} ; \end{matrix}

the third one concerns the maximum number of iterations: after 1000 steps, any of the three algorithms stops.

The results about the performance of our three algorithms are reported in Table 1. Overall, all recipes perform very satisfactorily, being accurate, fast and scalable; moreover, all algorithms stop either because the condition on the norm of the likelihood is satisfied or because the condition on the norm of the vector of parameters is satisfied.

Table 1.

Performance of Newton’s, quasi-Newton and the fixed-point algorithm to solve the reduced system of equations defining the UBCM, on a set of real-world BUNs (of which basic statistics as the total number of nodes, N, the total number of links, L, and the connectance, $c = 2 L / N (N - 1)$ , are provided).

	N	L	c	$c_{r}$	Newton		Quasi-Newton		Fixed-point
	N	L	c	$c_{r}$	MADE	Time (s)	MADE	Time (s)	MADE	Time (s)
C. Elegans (nn)	265	1879	$≃ 5 \cdot 10^{- 2}$	$≃ 1.5 \cdot 10^{- 1}$	$≃ 8.1 \cdot 10^{- 8}$	$≃ 0.005$	$≃ 1 \cdot 10^{- 6}$	$≃ 0.03$	$≃ 5 \cdot 10^{- 8}$	$≃ 0.004$
US airports	500	2980	$≃ 2 \cdot 10^{- 2}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 8.9 \cdot 10^{- 9}$	$≃ 0.008$	$≃ 1.2 \cdot 10^{- 6}$	$≃ 0.04$	$≃ 2.5 \cdot 10^{- 7}$	$≃ 0.005$
H. Pylori (pp)	732	1465	$≃ 5 \cdot 10^{- 3}$	$≃ 4.5 \cdot 10^{- 2}$	$≃ 1.3 \cdot 10^{- 8}$	$≃ 0.004$	$≃ 6.7 \cdot 10^{- 7}$	$≃ 0.03$	$≃ 7 \cdot 10^{- 8}$	$≃ 0.014$
Internet (AS)	11174	23409	$≃ 4 \cdot 10^{- 4}$	$≃ 1 \cdot 10^{- 2}$	$≃ 4.1 \cdot 10^{- 7}$	$≃ 0.03$	$≃ 5.1 \cdot 10^{- 6}$	$≃ 0.10$	$≃ 2 \cdot 10^{- 6}$	$≃ 0.005$
BLN 24-01-18	94	152	$≃ 3 \cdot 10^{- 3}$	$≃ 1.5 \cdot 10^{- 1}$	$≃ 1 \cdot 10^{- 8}$	$≃ 0.005$	$≃ 1.7 \cdot 10^{- 6}$	$≃ 0.014$	$≃ 4 \cdot 10^{- 8}$	$≃ 0.002$
BLN 25-02-18	499	1010	$≃ 8 \cdot 10^{- 3}$	$≃ 6.4 \cdot 10^{- 2}$	$≃ 1.6 \cdot 10^{- 8}$	$≃ 0.005$	$≃ 9.1 \cdot 10^{- 7}$	$≃ 0.02$	$≃ 7.3 \cdot 10^{- 8}$	$≃ 0.004$
BLN 30-03-18	1012	2952	$≃ 5 \cdot 10^{- 3}$	$≃ 5.2 \cdot 10^{- 2}$	$≃ 3.4 \cdot 10^{- 10}$	$≃ 0.005$	$≃ 2 \cdot 10^{- 5}$	$≃ 0.03$	$≃ 1.4 \cdot 10^{- 7}$	$≃ 0.005$
BLN 13-07-18	1999	8999	$≃ 4 \cdot 10^{- 3}$	$≃ 3.8 \cdot 10^{- 2}$	$≃ 6.7 \cdot 10^{- 10}$	$≃ 0.01$	$≃ 3 \cdot 10^{- 6}$	$≃ 0.05$	$≃ 1.7 \cdot 10^{- 7}$	$≃ 0.008$
BLN 19-12-18	3007	17689	$≃ 4 \cdot 10^{- 3}$	$≃ 4.5 \cdot 10^{- 2}$	$≃ 1.7 \cdot 10^{- 6}$	$≃ 0.03$	$≃ 1 \cdot 10^{- 5}$	$≃ 0.09$	$≃ 2 \cdot 10^{- 7}$	$≃ 0.010$
BLN 30-01-19	3996	27429	$≃ 3 \cdot 10^{- 3}$	$≃ 4.2 \cdot 10^{- 2}$	$≃ 7.2 \cdot 10^{- 7}$	$≃ 0.04$	$≃ 1.3 \cdot 10^{- 5}$	$≃ 0.12$	$≃ 2.7 \cdot 10^{- 7}$	$≃ 0.012$
BLN 01-03-19	5012	41096	$≃ 3 \cdot 10^{- 3}$	$≃ 4.7 \cdot 10^{- 2}$	$≃ 3.9 \cdot 10^{- 10}$	$≃ 0.03$	$≃ 1.7 \cdot 10^{- 4}$	$≃ 0.13$	$≃ 2.6 \cdot 10^{- 7}$	$≃ 0.013$
BLN 17-07-19	6447	54476	$≃ 3 \cdot 10^{- 3}$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.1 \cdot 10^{- 6}$	$≃ 0.06$	$≃ 1.2 \cdot 10^{- 5}$	$≃ 0.15$	$≃ 3.1 \cdot 10^{- 7}$	$≃ 0.015$

Open in a new tab

All algorithms stop either because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied or because the condition $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 8}$ is satisfied. For what concerns accuracy, the two most accurate methods are Newton’s and the fixed-point ones; for what concerns speed, the fastest method is the fixed-point one (although Newton’s one approximately requires the same amount of time on each specific configuration). Only the results corresponding to the best choice of initial conditions are reported.

For what concerns accuracy, the largest maximum error per method spans an interval (across all configurations) that amounts at $10^{- 10} ≲ {MADE}_{Newton}^{reduced} ≲ 10^{- 6}$ , $10^{- 6} \leq {MADE}_{Quasi-Newton}^{reduced} \leq 10^{- 5}$ and $10^{- 8} ≲ {MADE}_{fixed-point}^{reduced} ≲ 10^{- 6}$ . By looking at each specific network, it is evident that the two most accurate methods are systematically Newton’s and the fixed-point ones.

For what concerns speed, the amount of time required by each method to achieve convergence spans an interval (across all configurations) that is $0.005 \leq T_{Newton}^{reduced} \leq 0.01$ , $0.014 \leq T_{Quasi-Newton}^{reduced} \leq 0.15$ and $0.002 \leq T_{fixed-point}^{reduced} \leq 0.015$ (time is measured in seconds). The fastest method is the fixed-point one, although Newton’s method approximately requires the same amount of time, when compared to it on each specific configuration. Differences in the speed of convergence of any method, caused by the choice of a particular set of initial conditions, are indeed observable: the prescription reading $θ_{i}^{(0)} = - ln [\frac{k_{i} (A^{*})}{\sqrt{N}}]$ , $\forall i$ outperforms the other ones.

Let us now comment on the scalability of our algorithms. What we learn from our exercise is that scalability is not related to the network size in a simple way: the factors seemingly playing a major role are the ones affecting the reducibility of the original system of equations, i.e. the ones ‘deciding’ the number of different equations that actually need to be solved.

While reducibility can be easily quantified a posteriori, e.g. by calculating the coefficient of reduction, $c_{r}$ , defined as the ratio between the number of equations that survive to reduction and the number of equations defining the original problem (hence, the smaller the better), providing an exhaustive list of the aforementioned factors a priori is much more difficult.

In the case of the UBCM, $c_{r}$ is defined as the number of different degrees divided by the total number of nodes; one may, thus, argue that reducibility is affected by the heterogeneity of the degree distribution; upon considering that the latter can be quantified by computing the coefficient of variation (defined as $c_{v} = s / m$ , where s and m are, respectively, the standard deviation and the mean of the degree distribution of the network at hand), one may derive a simple rule of thumb: a larger coefficient of variation (pointing out a larger heterogeneity of the degree distribution) leads to a larger coefficient of reduction and a larger amount of time for convergence will be required. Notice that even if the degree distribution is narrow, outliers (e.g. hubs) may still play a role, forcing the corresponding parameters to assume either very large or very small values - hence, slowing down the entire convergence process.

In this sense, scalability is the result of a (non-trivial) interplay between size and reducibility. Let us take a look at Table 1: Internet is the most reducible network of our basket, although being the largest in size, while the neural network of C. Elegans is one of the least reducible networks of our basket, although being the second smallest one; as a consequence, the actual number of equations defining the UBCM on C. Elegans is $≃ 30$ while the actual number of equations defining the UBCM on Internet is $≃ 100$ - whence the larger amount of time to solve the latter. Remarkably, the time required by our recipes to ensure that the largest system of equations converges to the solution ranges from thousandths to tenths of seconds.

As a last comment, we would like to stress that, unlike several popular approximations as the Chung-Lu one²⁷, the generic coefficient $p_{ij}^{UBCM}$ always represents a proper probability, in turn implying that eq. (23) also provides us with a recipe to sample the canonical ensemble of BUNs, under the UBCM. Notice that the factorization of the graph probability $P_{UBCM} (A | \vec{θ})$ greatly simplifies the entire procedure, allowing a single graph to be sampled by implementing the Bernoulli trial

\begin{matrix} a_{ij} = \{\begin{matrix} 0 & 1 - p_{ij}^{UBCM} \\ 1 & p_{ij}^{UBCM} \end{matrix}) \end{matrix}

for each (undirected) pair of nodes, in either a sequential or a parallel fashion. The sampling process, whose computational complexity amounts at $O (N^{2})$ , can be repeated to generate as many configurations as desired. The pseudo-code for explicitly sampling the UBCM ensemble is summed up by Algorithm 1.

We explicitly acknowledge the existence of the algorithm proposed in³³ for sampling binary, undirected networks from the Chung-Lu model (i.e. the ‘sparse case’ approximation of the UBCM), a recipe that is applicable whenever the condition $p_{ij}^{CL} = \frac{k_{i} k_{j}}{2 L} < 1$ , $\forall i < j$ is verified. As explicitly acknowledged by the authors of ³³. However, it does not hold in several cases of interest: an example of paramount importance is provided by sparse networks whose degree distribution is scale-free. In such cases, $k_{\max} \sim N^{\frac{1}{γ - 1}}$ : hence, the hubs establish a connection with probability $p_{ij}^{CL} \sim \frac{N^{\frac{2}{γ - 1}}}{N - N^{\frac{1}{γ - 1}}}$ that becomes larger than 1 when $2 < γ \leq 3$ and diverges for $γ \to 2$ , thus leading to a strong violation of the requirement above. graphic file with name 41598_2021_93830_Figa_HTML.jpg

DBCM: binary directed graphs with given in-degree and out-degree sequences

Let us now move to consider binary, directed networks (BDNs). In this case, the simplest, non-trivial set of constraints is represented by the in-degrees and the out-degrees of nodes, where $k_{i}^{in} (A) = \sum_{j (\neq i) = 1}^{N} a_{ji}$ counts the number of nodes ‘pointing’ to node i and $k_{i}^{out} (A) = \sum_{j (\neq i) = 1}^{N} a_{ij}$ counts the number of nodes i ‘points’ to. The benchmark defined by this set of constraints is known as Directed Binary Configuration Model (DBCM) whose Hamiltonian reads

\begin{matrix} H_{DBCM} (A, \vec{α}, \vec{β}) = \sum_{i = 1}^{N} [α_{i} k_{i}^{out} (A) + β_{i} k_{i}^{in} (A)] ; \end{matrix}

as in the undirected case, entropy maximization^15,16 leads to a factorized probability distribution, i.e.

\begin{matrix} P_{DBCM} (A | \vec{α}, \vec{β}) = \prod_{i = 1}^{N} \prod_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij}^{a_{ij}} {(1 - p_{ij})}^{1 - a_{ij}} \end{matrix}

where $p_{ij} = p_{ij}^{DBCM} \equiv \frac{e^{- α_{i} - β_{j}}}{1 + e^{- α_{i} - β_{j}}}$ . The canonical ensemble of BDNs is, now, the set of networks with the same number of nodes, N, of the observed graph and a number of (directed) links varying from zero to the maximum value $N (N - 1)$ . The argument of the problem (6) for the specific network $A^{*}$ becomes

\begin{matrix} L_{DBCM} (\vec{α}, \vec{β}) = - & \sum_{i = 1}^{N} [α_{i} k_{i}^{out} (A^{*}) + β_{i} k_{i}^{in} (A^{*})] \\ - & \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} ln [1 + e^{- α_{i} - β_{j}}] \end{matrix}

whose first-order optimality conditions read

\begin{matrix} \nabla_{α_{i}} L_{DBCM} = & - k_{i}^{out} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{e^{- α_{i} - β_{j}}}{1 + e^{- α_{i} - β_{j}}} \\ = & - k_{i}^{out} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij}^{DBCM} \\ = & - k_{i}^{out} (A^{*}) + ⟨ k_{i}^{out} ⟩ = 0, i = 1 \dots N \end{matrix}

and

\begin{matrix} \nabla_{β_{i}} L_{DBCM} = & - k_{i}^{in} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{e^{- α_{j} - β_{i}}}{1 + e^{- α_{j} - β_{i}}} \\ = & - k_{i}^{in} (A^{*}) + \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ji}^{DBCM} \\ = & - k_{i}^{in} (A^{*}) + ⟨ k_{i}^{in} ⟩ = 0, i = 1 \dots N . \end{matrix}

Resolution of the DBCM. Newton’s and the quasi-Newton method can be easily implemented via the recipe defined in Eq. (18) (see “Appendix A” for the definition of the DBCM Hessian).

The fixed-point recipe for solving the system of equations embodying the DBCM first-order optimality conditions can, instead, be re-written in the usual iterative fashion as follows

\begin{matrix} α_{i}^{(n)} = - ln [\frac{k_{i}^{out} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- β_{j}^{(n - 1)}}}{1 + e^{- α_{i}^{(n - 1)} - β_{j}^{(n - 1)}}})}], i = 1 \dots N \\ β_{i}^{(n)} = - ln [\frac{k_{i}^{in} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{j}^{(n - 1)}}}{1 + e^{- α_{j}^{(n - 1)} - β_{i}^{(n - 1)}}})}], i = 1 \dots N \end{matrix}

Analogously to the undirected case, the initialization of this recipe has been implemented in three different ways. The first one reads $α_{i}^{(0)} = - ln [\frac{k_{i}^{out} (A^{*})}{\sqrt{L}}]$ , $i = 1 \dots N$ and $β_{i}^{(0)} = - ln [\frac{k_{i}^{in} (A^{*})}{\sqrt{L}}]$ , $i = 1 \dots N$ and represents a good approximation to the solution of the system of equations defining the DBCM in the ‘sparse case’ (i.e. whenever $p_{ij}^{DBCM} ≃ e^{- α_{i} - β_{j}}$ ); the second one is a variant of the position above, reading $α_{i}^{(0)} = - ln [\frac{k_{i}^{out} (A^{*})}{\sqrt{N}}]$ , $i = 1 \dots N$ and $β_{i}^{(0)} = - ln [\frac{k_{i}^{in} (A^{*})}{\sqrt{N}}]$ , $i = 1 \dots N$ ; the third one, instead, prescribes to randomly draw the value of each parameter from a uniform distribution defined on the unit interval, i.e. $α_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ and $β_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ . As for the UBCM, the identification $p_{ij}^{DBCM} \equiv \frac{e^{- α_{i}^{(\infty)} - β_{j}^{(\infty)}}}{1 + e^{- α_{i}^{(\infty)} - β_{j}^{(\infty)}}}$ , $\forall i \neq j$ allows the probability coefficients defining the DBCM to be numerically determined.

Reducing the dimensionality of the problem. As for the UBCM, we can define a ‘reduced’ version of the DBCM likelihood, accounting only for the distinct (pairs of) values of the degrees. By defining $k^{out} \equiv k$ and $k^{in} \equiv h$ , in order to simplify the formalism, the reduced DBCM recipe reads

\begin{matrix} L_{DBCM}^{reduced} (\vec{θ}) = & - \sum_{k} \sum_{h} n (k, h) [α_{k, h} k (A^{*}) + β_{k, h} h (A^{*})] \\ - \sum_{k, h} \sum_{k^{'}, h^{'}} n (k, h) [n (k^{'}, h^{'}) - δ_{k k^{'}} δ_{h h^{'}}] \cdot \\ \cdot ln [1 + e^{- α_{k, h} - β_{k^{'}, h^{'}}}] ; \end{matrix}

the implementation of the algorithms considered here must be modified in a way that is analogous to the one already described for the UBCM. In particular, the fixed-point recipe for the DBCM can be re-written by assigning to the nodes with the same out- and in-degrees (k, h) the same pair of values $(α, β)$ , i.e. as

\begin{matrix} α_{k, h}^{(n)} = & - ln [\frac{k (A^{*})}{\sum_{k^{'}, h^{'}} [n (k^{'}, h^{'}) - δ_{k k^{'}} δ_{h h^{'}}] (\frac{e^{- β_{k^{'}, h^{'}}^{(n - 1)}}}{1 + e^{- α_{k, h}^{(n - 1)} - β_{k^{'}, h^{'}}^{(n - 1)}}})}], \forall k, h \end{matrix}

\begin{matrix} β_{k, h}^{(n)} = & - ln [\frac{h (A^{*})}{\sum_{k^{'}, h^{'}} [n (k^{'}, h^{'}) - δ_{k k^{'}} δ_{h h^{'}}] (\frac{e^{- α_{k^{'}, h^{'}}^{(n - 1)}}}{1 + e^{- α_{k, h}^{(n - 1)} - β_{k^{'}, h^{'}}^{(n - 1)}}})}], \forall k, h \end{matrix}

where the sums, now, run over the distinct values of the out- and in-degrees, n(k, h) is the number of nodes whose out-degree is k and whose in-degree is h and, as usual, the last term at the denominator excludes the self-contribution (i.e. the probability that a node links to itself).

Performance testing. As for the UBCM, the performance of the three algorithms in solving the reduced version of Eqs. (37) and (38) has been tested on a bunch of real-world networks. The latter ones span economic, financial and social networks. In particular, we have considered the World Trade Web (WTW) during the decade 1992–2002³⁴, a pair of snapshots of the Bitcoin User Network at the weekly time scale (the first day of those weeks being 13-02-12 and 27-04-15, respectively)³⁵ and of the corresponding largest weakly connected component (whose size is, respectively, $≃ 65 %$ and $≃ 90 %$ of the full network size) and a snapshot of the semantic network concerning the Twitter discussion about the Covid-19 pandemics (more precisely: it is the network of re-tweets of the (online) Italian debate about Covid-19, collected in the period 21st February–20th April 2020)³⁶. Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out.

The accuracy of each algorithm in reproducing the constraints defining the DBCM has been quantified via the maximum absolute error metrics that, in this case, reads

\begin{matrix} MADE = max_{i} {|k_{i}^{*} - ⟨ k_{i} ⟩|, |h_{i}^{*} - ⟨ h_{i} ⟩|}_{i = 1}^{N} \end{matrix}

and accounts for the presence of two different degrees per node. As for the UBCM, it is the infinite norm of the difference between the vector of the empirical values of the constraints and that of their expected values.

The three different ‘stop criteria’ we have considered match the ones adopted for analysing the undirected case and consist in a condition on the Euclidean norm of the gradient of the likelihood function, i.e. $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ , in a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e. $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 8}$ , and in a condition on the maximum number of iterations: after 10,000 steps, any of the three algorithms stops.

The results about the performance of our three algorithms are reported in Table 2. Overall, all recipes perform very satisfactorily, being accurate, fast and scalable; however, while Newton’s and the quasi-Newton methods stop because the condition on the norm of the likelihood is satisfied, the fixed-point recipe is always found to satisfy the limit condition on the number of steps (i.e. it runs for 10000 steps and, then, stops).

Table 2.

Performance of Newton’s, quasi-Newton and the fixed-point algorithm to solve the reduced system of equations defining the DBCM, on a set of real-world BDNs (of which basic statistics as the total number of nodes, N, the total number of links, L, and the connectance, $c = L / N (N - 1)$ , are provided).

	N	L	c	$c_{r}$	Newton		Quasi-Newton		Fixed-point
	N	L	c	$c_{r}$	MADE	Time (s)	MADE	Time (s)	MADE	Time (s)
WTW 92	162	5891	$≃ 2.3 \cdot 10^{- 1}$	$≃ 2.8 \cdot 10^{- 1}$	$≃ 2.6 \cdot 10^{- 9}$	$≃ 3$	$≃ 1.4 \cdot 10^{- 10}$	$≃ 0.12$	$≃ 3.5 \cdot 10^{- 2}$	$≃ 2.5$
WTW 93	162	7384	$≃ 2.8 \cdot 10^{- 1}$	$≃ 3.5 \cdot 10^{- 1}$	$≃ 6.7 \cdot 10^{- 9}$	$≃ 0.5$	$≃ 5.3 \cdot 10^{- 7}$	$≃ 0.16$	$≃ 3.4 \cdot 10^{- 2}$	$≃ 3.1$
WTW 94	162	9395	$≃ 3.6 \cdot 10^{- 1}$	$≃ 4.1 \cdot 10^{- 1}$	$≃ 2.6 \cdot 10^{- 9}$	$≃ 7.4$	$≃ 3.1 \cdot 10^{- 10}$	$≃ 0.18$	$≃ 3.5 \cdot 10^{- 2}$	$≃ 3.7$
WTW 95	162	10947	$≃ 4.2 \cdot 10^{- 1}$	$≃ 4.3 \cdot 10^{- 1}$	$≃ 4 \cdot 10^{- 9}$	$≃ 16$	$≃ 4 \cdot 10^{- 7}$	$≃ 0.21$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 3.9$
WTW 96	162	11869	$≃ 4.6 \cdot 10^{- 1}$	$≃ 4.5 \cdot 10^{- 1}$	$≃ 3.3 \cdot 10^{- 9}$	$≃ 1.1$	$≃ 9.2 \cdot 10^{- 7}$	$≃ 0.33$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.1$
WTW 97	162	12840	$≃ 4.9 \cdot 10^{- 1}$	$≃ 4.5 \cdot 10^{- 1}$	$≃ 2.4 \cdot 10^{- 9}$	$≃ 13$	$≃ 2.4 \cdot 10^{- 10}$	$≃ 0.16$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.2$
WTW 98	162	13344	$≃ 5.1 \cdot 10^{- 1}$	$≃ 4.5 \cdot 10^{- 1}$	$≃ 2.4 \cdot 10^{- 9}$	$≃ 9.3$	$≃ 3.9 \cdot 10^{- 10}$	$≃ 0.16$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.2$
WTW 99	162	13810	$≃ 5.3 \cdot 10^{- 1}$	$≃ 4.5 \cdot 10^{- 1}$	$≃ 2.5 \cdot 10^{- 9}$	$≃ 13$	$≃ 9.4 \cdot 10^{- 7}$	$≃ 0.3$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.1$
WTW 00	162	14095	$≃ 5.4 \cdot 10^{- 1}$	$≃ 4.8 \cdot 10^{- 1}$	$≃ 1.8 \cdot 10^{- 9}$	$≃ 10$	$≃ 3.9 \cdot 10^{- 10}$	$≃ 0.17$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.4$
WTW 01	162	14521	$≃ 5.6 \cdot 10^{- 1}$	$≃ 4.7 \cdot 10^{- 1}$	$≃ 6.8 \cdot 10^{- 9}$	$≃ 10$	$≃ 4.2 \cdot 10^{- 6}$	$≃ 0.22$	$≃ 3.4 \cdot 10^{- 2}$	$≃ 4.3$
WTW 02	162	13911	$≃ 5.3 \cdot 10^{- 1}$	$≃ 4.6 \cdot 10^{- 1}$	$≃ 3.8 \cdot 10^{- 9}$	$≃ 13$	$≃ 5.8 \cdot 10^{- 10}$	$≃ 0.18$	$≃ 3.3 \cdot 10^{- 2}$	$≃ 4.3$
${BTC}_{week-1}^{CC}$	13576	20604	$≃ 1.1 \cdot 10^{- 4}$	$≃ 1 \cdot 10^{- 2}$	$≃ 3 \cdot 10^{- 7}$	$≃ 1.3$	$≃ 1.5 \cdot 10^{- 4}$	$≃ 0.4$	$≃ 4 \cdot 10^{- 6}$	$≃ 0.1$
${BTC}_{week-1}$	20984	25553	$≃ 5.8 \cdot 10^{- 5}$	$≃ 6.6 \cdot 10^{- 3}$	$≃ 5.1 \cdot 10^{- 4}$	$≃ 4$	$≃ 2.7 \cdot 10^{- 7}$	$≃ 0.11$	$≃ 6 \cdot 10^{- 6}$	$≃ 0.05$
${BTC}_{week-2}^{CC}$	297338	554643	$≃ 6 \cdot 10^{- 6}$	$≃ 2 \cdot 10^{- 3}$	$≃ 9 \cdot 10^{- 9}$	$≃ 12$	$≃ 7.4 \cdot 10^{- 5}$	$≃ 7$	$≃ 8 \cdot 10^{- 6}$	$≃ 0.65$
${BTC}_{week-2}$	338334	571551	$≃ 5 \cdot 10^{- 6}$	$≃ 1.9 \cdot 10^{- 3}$	$≃ 1.5 \cdot 10^{- 4}$	$≃ 16$	$≃ 9.5 \cdot 10^{- 4}$	$≃ 2.3$	$≃ 1.1 \cdot 10^{- 5}$	$≃ 0.5$
Twitter	436551	1489857	$≃ 7.8 \cdot 10^{- 6}$	$≃ 4.6 \cdot 10^{- 3}$	$≃ 9.5 \cdot 10^{- 9}$	$≃ 6$	$≃ 4.9 \cdot 10^{- 3}$	$≃ 40$	$≃ 1.3 \cdot 10^{- 5}$	$≃ 7.6$

Open in a new tab

For what concerns the World Trade Web, both Newton’s and the quasi-Newton methods stop because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied; the fixed-point recipe, instead, always reaches the limit of 10,000 steps. The fastest and most accurate method is systematically the quasi-Newton one. The picture changes when very large networks, as Bitcoin and Twitter, are considered: in these cases, the fastest and most accurate method is the fixed-point one. Only the results corresponding to the best choice of initial conditions are reported.

Let us start by commenting the results concerning the WTW. For what concerns accuracy, the largest maximum error per method spans an interval, across all configurations, that amounts at ${MADE}_{Newton}^{reduced} ≃ 10^{- 9}$ , $10^{- 10} ≲ {MADE}_{Quasi-Newton}^{reduced} ≲ 10^{- 6}$ and ${MADE}_{fixed-point}^{reduced} ≃ 10^{- 2}$ . By looking at each specific network, it is evident that the most accurate method is systematically the quasi-Newton one.

For what concerns speed, the amount of time required by each method to achieve convergence spans an interval (across all configurations) that is $0.5 \leq T_{Newton}^{reduced} \leq 13$ , $0.21 \leq T_{Quasi-Newton}^{reduced} \leq 0.33$ and $2.5 \leq T_{fixed-point}^{reduced} \leq 4.4$ (time is measured in seconds). The fastest method is the quasi-Newton one, followed by Newton’s method and the fixed-point recipe. The latter is the slowest method since it always reaches the limit of 10000 steps while the other two competing ones stop after few iterations. Appreciable differences in the speed of convergence of any method, caused by the choice of a particular set of initial conditions, are not observed.

The observations above hold true when the WTW is considered. The picture changes when very large networks, as Bitcoin and Twitter, are considered. First, let us notice that Bitcoin and Twitter ‘behave’ as the undirected version of Internet considered to solve the UBCM, i.e. they are very redundant, hosting many nodes with the same out- and in-degrees (in fact, the coefficient of reduction, $c_{r}$ , is, now, defined as the number of different ‘out-degree - in-degree’ pairs divided by twice the number of nodes). To provide a specific example, out of the original 676688 equations defining the DBCM for one of the two Bitcoin snapshots considered here, only $≃ 339$ equations survive the reduction; by converse, the WTW can be reduced to a much smaller extent (to be more specific, out of the original 324 equations defining the DBCM for the WTW in 1997, only $≃ 291$ equations survive the reduction). Interestingly, a good proxy of the reducibility of the directed configurations considered here is provided by their connectance (i.e. the denser the network, the less reducible it is).

On the one hand, this feature, common to the very large networks considered here, is what guarantees their resolution in a reasonable amount of time; on the other one, it seems not to be enough to let Newton’s and the quasi-Newton method be as fast as in the undirected case. For our binary, directed networks, in fact, the fastest (and, for some configurations, the most accurate) method becomes the fixed-point one. In order to understand this result we need to consider that both Newton’s and the quasi-Newton method require (some proxy of) the Hessian matrix of the DBCM to update the value of the parameters: since the order of the latter is $O (N^{2})$ for Newton’s method and O(N) for the quasi-Newton one, its calculation can be (very) time demanding - beside requiring a lot of memory for the step-wise update of the corresponding Hessian matrix. However, while this is compensated by a larger accuracy in the case of Newton’s method, this is no longer true when the quasi-Newton recipe is considered - the reason maybe lying in the poorer approximation provided by the diagonal of the Hessian matrix in case of systems like these.

As a last comment, we would like to stress that, as in the undirected case, the generic coefficient $p_{ij}^{DBCM}$ represents a proper probability, in turn implying that Eq. (35) also provides us with a recipe to sample the canonical ensemble of BDNs, under the DBCM. Notice that the factorization of the graph probability $P_{DBCM} (A | \vec{θ})$ greatly simplifies the entire procedure, allowing a single graph to be sampled by implementing the Bernoulli trial

\begin{matrix} a_{ij} = \{\begin{matrix} 0 & 1 - p_{ij}^{DBCM} \\ 1 & p_{ij}^{DBCM} \end{matrix}) \end{matrix}

for each (directed) pair of nodes, in either a sequential or a parallel fashion. The sampling process, whose computational complexity amounts at $O (N^{2})$ , can be repeated to generate as many configurations as desired. The pseudo-code for explicitly sampling the DBCM ensemble is summed up by Algorithm 2. graphic file with name 41598_2021_93830_Figb_HTML.jpg

BiCM: bipartite binary undirected graphs with given degree sequences

So far, we have considered monopartite networks. However, the algorithm we have described for solving the DBCM can be adapted, with little effort, to solve a null model designed for bipartite, binary, undirected networks (BiBUNs), i.e. the so-called Bipartite Configuration Model (BiCM)³⁷. These networks are defined by two distinct layers (say, $⊤$ and $⊥$ ) and obey the rule that links can exist only between (and not within) layers: for this reason, they can be compactly described via a biadjacency matrix $B \equiv {b_{i α}}_{i, α}$ whose generic entry $b_{i α}$ is 1 if node i belonging to layer $⊥$ is linked to node $α$ belonging to layer $⊤$ and 0 otherwise. The constraints defining the BiCM are represented by the degree sequences ${k_{i}}_{i \in ⊥}$ and ${d_{α}}_{α \in ⊤}$ where $k_{i} = \sum_{α \in ⊤} b_{i α}$ counts the neighbors of node i (belonging to layer $⊤$ ) and $d_{α} = \sum_{i \in ⊥} b_{i α}$ counts the neighbors of node $α$ (belonging to layer $⊥$ ).

Analogously to the DBCM case,

\begin{matrix} P (B | \vec{γ}, \vec{β}) = \prod_{i \in ⊥} \prod_{α \in ⊤} p_{i α}^{b_{i α}} {(1 - p_{i α})}^{1 - b_{i α}} \end{matrix}

where $p_{i α} = p_{i α}^{BiCM} \equiv \frac{e^{- γ_{i} - β_{α}}}{1 + e^{- γ_{i} - β_{α}}}$ . The canonical ensemble of BiBUNs includes all networks with, say, N nodes on one layer, M nodes on the other layer and a number of links (connecting nodes of different layers) ranging from zero to the maximum value $N \cdot M$ .

The BiCM likelihood function reads

\begin{matrix} L_{BiCM} (\vec{γ}, \vec{β}) = - & \sum_{i \in ⊥} γ_{i} k_{i} (B^{*}) - \sum_{α \in ⊤} β_{α} d_{α} (B^{*}) \\ - & \sum_{i \in ⊥} \sum_{α \in ⊤} ln [1 + e^{- γ_{i} - β_{α}}] \end{matrix}

whose first-order optimality conditions read

\begin{matrix} \nabla_{γ_{i}} L_{BiCM} = & - k_{i} (B^{*}) + \sum_{α \in ⊤} \frac{e^{- γ_{i} - β_{α}}}{1 + e^{- γ_{i} - β_{α}}}, i \in ⊥ \\ \nabla_{β_{α}} L_{BiCM} = & - d_{α} (B^{*}) + \sum_{i \in ⊥} \frac{e^{- γ_{i} - β_{α}}}{1 + e^{- γ_{i} - β_{α}}}, α \in ⊤ \end{matrix}

Resolution of the BiCM. As for the DBCM case, Newton’s and the quasi-Newton methods can be implemented by adapting the recipe defined in Eq. (18) to the bipartite case (see “Appendix A” for the definition of the BiCM Hessian).

As for the DBCM, the fixed-point recipe for the BiCM can be re-written in the usual iterative fashion as follows

\begin{matrix} γ_{i}^{(n)} = - ln [\frac{k_{i} (B^{*})}{\sum_{α \in ⊤} (\frac{e^{- β_{α}^{(n - 1)}}}{1 + e^{- γ_{i}^{(n - 1)} - β_{α}^{(n - 1)}}})}], \forall i \in ⊥ \\ β_{α}^{(n)} = - ln [\frac{d_{α} (B^{*})}{\sum_{i \in ⊥} (\frac{e^{- γ_{i}^{(n - 1)}}}{1 + e^{- γ_{i}^{(n - 1)} - β_{α}^{(n - 1)}}})}], \forall α \in ⊤ \end{matrix}

and the initialization is similar as well: in fact, we can employ the value of the solution of the BiCM in the sparse case, i.e. $γ_{i}^{(0)} = - ln [\frac{k_{i} (B^{*})}{\sqrt{L}}]$ , $\forall i \in ⊥$ and $β_{α}^{(0)} = - ln [\frac{d_{α} (B^{*})}{\sqrt{L}}]$ , $\forall α \in ⊤$ (only this set of initial conditions has been employed to analyse the bipartite case).

Reducing the dimensionality of the problem. Exactly as for the DBCM case, the presence of nodes with the same degree, on the same layer, leads to the appearance of identical equations in the system above; hence, the computation of the solutions can be sped up by writing

\begin{matrix} γ_{k}^{(n)} = - ln [\frac{k (B^{*})}{\sum_{d} g (d) (\frac{e^{- β_{k, d}^{(n - 1)}}}{1 + e^{- γ_{k, d}^{(n - 1)} - β_{k, d}^{(n - 1)}}})}], \forall k \\ β_{d}^{(n)} = - ln [\frac{d (B^{*})}{\sum_{k} f (k) (\frac{e^{- γ_{k, d}^{(n - 1)}}}{1 + e^{- γ_{k, d}^{(n - 1)} - β_{k, d}^{(n - 1)}}})}], \forall d \end{matrix}

where f(k) is the number of nodes, belonging to layer $⊥$ , whose degree is k and g(d) is the number of nodes, belonging to layer $⊤$ , whose degree is d.

Performance testing. The performance of the three algorithms in solving Eq. (49) has been tested on 16 snapshot of the bipartite, binary, undirected version of the WTW, gathering the country-product export relationships across the years 1995–2010³⁷. Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out.

The accuracy of each algorithm in reproducing the constraints defining the BiCM has been quantified via the maximum absolute error metrics that, now, reads

\begin{matrix} MADE & = max {|k_{1}^{*} - ⟨ k_{1} ⟩| \dots |k_{N}^{*} - ⟨ k_{N} ⟩|, \\ |d_{1}^{*} - ⟨ d_{1} ⟩| \dots |d_{M}^{*} - ⟨ d_{M} ⟩|} \end{matrix}

to account for the degrees of nodes on both layers.

The three different ‘stop criteria’ match the ones adopted for analysing the UBCM and the DBCM and consist in a condition on the Euclidean norm of the gradient of the likelihood function, i.e. $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 10}$ , in a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e. $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 10}$ , and a condition on the maximum number of iterations: after 1000 steps, any of the three algorithms stops.

The results about the performance of our three algorithms are reported in Table 3. Overall, all recipes are accurate, fast and scalable; all methods stop because the condition on the norm of the likelihood is satisfied.

Table 3.

Performance of Newton’s, quasi-Newton and the fixed-point algorithm to solve the reduced system of equations defining the BiCM, on a set of real-world BiUNs (of which basic statistics as the total number of nodes, N, the total number of links, L, and the connectance, $c = L / (N \cdot M)$ , are provided).

	$N + M$	L	c	$c_{r}$	Newton		Quasi-Newton		Fixed-point
	$N + M$	L	c	$c_{r}$	MADE	Time (s)	MADE	Time (s)	MADE	Time (s)
WTW 95	1277	18947	$≃ 1.1 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.1 \cdot 10^{- 13}$	$≃ 0.0022$	$≃ 3.0 \cdot 10^{- 6}$	$≃ 0.012$	$≃ 7.7 \cdot 10^{- 6}$	$≃ 0.012$
WTW 96	1277	19934	$≃ 1.2 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0023$	$≃ 1.5 \cdot 10^{- 6}$	$≃ 0.014$	$≃ 1.1 \cdot 10^{- 4}$	$≃ 0.023$
WTW 97	1277	20222	$≃ 1.2 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.7 \cdot 10^{- 13}$	$≃ 0.0022$	$≃ 3.5 \cdot 10^{- 6}$	$≃ 0.02$	$≃ 2.4 \cdot 10^{- 4}$	$≃ 0.013$
WTW 98	1277	20614	$≃ 1.2 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 2.8 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 1.2 \cdot 10^{- 6}$	$≃ 0.015$	$≃ 1.8 \cdot 10^{- 4}$	$≃ 0.018$
WTW 99	1277	20949	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 2.8 \cdot 10^{- 5}$	$≃ 0.012$	$≃ 2.1 \cdot 10^{- 4}$	$≃ 0.019$
WTW 00	1277	21257	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0025$	$≃ 1.3 \cdot 10^{- 6}$	$≃ 0.016$	$≃ 2.8 \cdot 10^{- 5}$	$≃ 0.018$
WTW 01	1277	21326	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.7 \cdot 10^{- 13}$	$≃ 0.0023$	$≃ 3.4 \cdot 10^{- 5}$	$≃ 0.015$	$≃ 2.5 \cdot 10^{- 5}$	$≃ 0.015$
WTW 02	1277	21333	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 1.7 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 4.1 \cdot 10^{- 6}$	$≃ 0.018$	$≃ 2.1 \cdot 10^{- 4}$	$≃ 0.016$
WTW 03	1277	21330	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 2.8 \cdot 10^{- 13}$	$≃ 0.0023$	$≃ 1.1 \cdot 10^{- 6}$	$≃ 0.015$	$≃ 4.3 \cdot 10^{- 5}$	$≃ 0.014$
WTW 04	1277	21479	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.7 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 2.2 \cdot 10^{- 7}$	$≃ 0.018$	$≃ 2.1 \cdot 10^{- 4}$	$≃ 0.019$
WTW 05	1278	21841	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 2.2 \cdot 10^{- 6}$	$≃ 0.013$	$≃ 2.3 \cdot 10^{- 4}$	$≃ 0.027$
WTW 06	1279	21945	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0023$	$≃ 1.3 \cdot 10^{- 5}$	$≃ 0.016$	$≃ 2.2 \cdot 10^{- 4}$	$≃ 0.012$
WTW 07	1279	22036	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.4 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0024$	$≃ 2.0 \cdot 10^{- 6}$	$≃ 0.017$	$≃ 2.1 \cdot 10^{- 4}$	$≃ 0.023$
WTW 08	1279	21889	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.1 \cdot 10^{- 13}$	$≃ 0.0023$	$≃ 1.5 \cdot 10^{- 5}$	$≃ 0.017$	$≃ 2.5 \cdot 10^{- 4}$	$≃ 0.024$
WTW 09	1279	21621	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0025$	$≃ 2.1 \cdot 10^{- 6}$	$≃ 0.021$	$≃ 2.4 \cdot 10^{- 4}$	$≃ 0.018$
WTW 10	1279	21010	$≃ 1.3 \cdot 10^{- 1}$	$≃ 1.3 \cdot 10^{- 1}$	$≃ 2.3 \cdot 10^{- 13}$	$≃ 0.0022$	$≃ 1.6 \cdot 10^{- 6}$	$≃ 0.015$	$≃ 2.6 \cdot 10^{- 4}$	$≃ 0.022$

Open in a new tab

All algorithms stop because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 10}$ is satisfied. For what concerns both accuracy and speed, the best performing method is Newton’s one, followed by the quasi-Newton and the fixed-point recipes. Only the results corresponding to the best choice of initial conditions are reported.

For what concerns accuracy, the largest maximum error per method spans an interval (across all configurations) that amounts at ${MADE}_{Newton}^{reduced} ≃ 10^{- 13}$ , $10^{- 7} ≲ {MADE}_{Quasi-Newton}^{reduced} ≲ 10^{- 5}$ and $10^{- 5} ≲ {MADE}_{fixed-point}^{reduced} ≲ 10^{- 4}$ . By looking at each specific network, it is evident that the most accurate method is systematically Newton’s one.

For what concerns speed, the amount of time required by each method to achieve convergence spans an interval (across all configurations) that is $T_{Newton}^{reduced} ≃ 0.0023$ (on average), $T_{Quasi-Newton}^{reduced} ≃ 0.016$ (on average) and $T_{fixed-point}^{reduced} ≃ 0.018$ (on average) (time is measured in seconds). The fastest method is Newton’s one and is followed by the quasi-Newton and the fixed-point recipes. The gain in terms of speed due to the reducibility (now quantified by the ratio between the number of different pairs of degrees divided by the total number of nodes) of the system of equations defining the BiCM is also evident: while solving the original problem would have required handling a system of $≃ 10^{3}$ equations, the reduced one is defined by just $≃ 10^{2}$ distinct equations. Overall, a solution is always found within thousandths or hundredths of seconds.

As for the DBCM case, the ensemble of BiBUNs can be sampled by implementing a Bernoulli trial $b_{i α} \sim Ber [p_{i α}]$ for any two nodes (belonging to different layers) in either a sequential or a parallel fashion. The computational complexity of the sampling process amounts at $O (N \cdot M)$ and can be repeated to generate as many configurations as desired. The pseudo-code for explicitly sampling the BiCM ensemble is summed up by Algorithm 3. graphic file with name 41598_2021_93830_Figc_HTML.jpg

UECM: weighted undirected graphs with given strengths and degrees

So far, we have considered purely binary models. Let us now focus on a class of ‘mixed’ null models for weighted networks, defined by constraining both binary and weighted quantities. Purely weighted models such as the Undirected and the Directed Weighted Configuration Model have not been considered since, as it has been proven elsewhere³⁸, they perform quite poorly when employed to reconstruct networks. Let us start by the simplest model, i.e. the one constraining the degrees and the strengths in an undirected fashion. While $k_{i} (A) = \sum_{j (\neq i) = 1}^{N} a_{ij}$ counts the number of neighbors of node i, $s_{i} (W) = \sum_{j (\neq i) = 1}^{N} w_{ij}$ defines the weighted equivalent of the degree of node i, i.e. its strength. For consistency, the binary adjacency matrix can be defined via the Heaviside step function, $Θ [.]$ , as $A \equiv Θ [W]$ a position indicating that $a_{ij} = 1$ if $w_{ij} > 0$ , $\forall i < j$ and zero otherwise. This particular model is known as Undirected Enhanced Configuration Model (UECM)^38–40 and its Hamiltonian reads

\begin{matrix} H_{UECM} (W, \vec{α}, \vec{β}) = \sum_{i = 1}^{N} [α_{i} k_{i} (A) + β_{i} s_{i} (W)] ; \end{matrix}

it induces a probability distribution which is halfway between a Bernoulli and a geometric one³⁹, i.e.

\begin{matrix} Q_{UECM} (W | \vec{α}, \vec{β}) = \prod_{i = 1}^{N} \prod_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} q_{ij} (w_{ij}) \end{matrix}

with

\begin{matrix} q_{ij} (w) = \{\begin{matrix} 1 - p_{ij}^{UECM}, & w = 0 \\ p_{ij}^{UECM} {(e^{- β_{i} - β_{j}})}^{w - 1} (1 - e^{- β_{i} - β_{j}}), & w > 0 \end{matrix}) \end{matrix}

for any two nodes i and j such that $i < j$ and $p_{ij}^{UECM} = \frac{e^{- α_{i} - α_{j} - β_{i} - β_{j}}}{1 - e^{- β_{i} - β_{j}} + e^{- α_{i} - α_{j} - β_{i} - β_{j}}}$ . Notice that the functional form above is obtained upon requiring that the weights only assume (non-negative) integer values (i.e. $w_{ij} \in [0, + \infty)$ , $\forall i < j$ ): hence, the canonical ensemble is now constituted by the weighted configurations with N nodes and a number of (undirected) links ranging between zero and the maximum value $(\begin{matrix} N \\ 2 \end{matrix})$ .

The argument of the problem (6) for the specific network $W^{*}$ now becomes

\begin{matrix} L_{UECM} (\vec{α}, \vec{β}) = - & \sum_{i = 1}^{N} [α_{i} k_{i} (A^{*}) + β_{i} s_{i} (W^{*})] \\ - & \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} ln [1 + e^{- α_{i} - α_{j}} (\frac{e^{- β_{i} - β_{j}}}{1 - e^{- β_{i} - β_{j}}})] \end{matrix}

whose first-order optimality conditions read

\begin{matrix} \nabla_{α_{i}} L_{UECM} = & - k_{i} (A^{*}) + \sum_{j (\neq i) = 1}^{N} p_{ij}^{UECM} \\ = & - k_{i} (A^{*}) + ⟨ k_{i} ⟩ = 0, i = 1 \dots N, \\ \nabla_{β_{i}} L_{UECM} = & - s_{i} (W^{*}) + \sum_{j (\neq i) = 1}^{N} \frac{p_{ij}^{UECM}}{1 - e^{- β_{i} - β_{j}}} \\ = & - s_{i} (W^{*}) + ⟨ s_{i} ⟩ = 0, i = 1 \dots N . \end{matrix}

Resolution of the UECM Newton’s and the quasi-Newton methods can be easily implemented via the recipe defined in Eq. (18) (see “Appendix A” for the definition of the UECM Hessian).

As for the purely binary models, the fixed-point recipe for solving the UECM first-order optimality conditions transforms the following set of consistency equations

\begin{matrix} α_{i} = & - ln [\frac{k_{i} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{j} - β_{i} - β_{j}}}{1 - e^{- β_{i} - β_{j}} + e^{- α_{i} - α_{j} - β_{i} - β_{j}}})}], \\ β_{i} = & - ln [\frac{s_{i} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{i} - α_{j} - β_{j}}}{(1 - e^{- β_{i} - β_{j}}) (1 - e^{- β_{i} - β_{j}} + e^{- α_{i} - α_{j} - β_{i} - β_{j}})})}] \end{matrix}

(with $i = 1 \dots N$ ) into the usual iterative fashion, by considering the parameters at the left hand side and at the right hand side, respectively at the n-th and at the $(n - 1)$ -th iteration. It is important to remark that a reduced version of the iterative recipe above can indeed be written, by assigning the same pair of values $(α, β)$ to the nodes with the same pair of values (k, s): however, the larger heterogeneity of the strengths causes this event to happen more rarely than for purely binary models such as the UBCM and the DBCM.

As for the purely binary cases, three different sets of initial conditions have been considered, whose definition follows from the simplest conceivable generalization of the purely binary cases. In particular, the first set of values reads $α_{i}^{(0)} = - ln [\frac{k_{i} (A^{*})}{\sqrt{2 L}}]$ , $i = 1 \dots N$ and $β_{i}^{(0)} = - ln [\frac{s_{i} (W^{*})}{\sqrt{2 W}}]$ , $i = 1 \dots N$ ; the second set is a variant of the first, reading $α_{i}^{(0)} = - ln [\frac{k_{i} (A^{*})}{\sqrt{N}}]$ , $i = 1 \dots N$ and $β_{i}^{(0)} = - ln [\frac{s_{i} (W^{*})}{\sqrt{N}}]$ , $i = 1 \dots N$ ; the third recipe, instead, prescribes to randomly draw the value of each parameter from the uniform distribution defined on the unit interval, i.e. $α_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ and $β_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ .

Performance testing The performance of the three algorithms to solve the system of equations defining the UECM has been tested on a bunch of real-world networks. In particular, we have considered the WTW during the decade 1990–2000⁴¹. Since the weights defining the configurations of the WTW are real numbers, we have rounded them to the nearest integer value, before running the UECM. Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out.

The accuracy of each algorithm in reproducing the constraints defining the UECM has been now quantified via the maximum relative error metrics, defined, in a perfectly general fashion, as ${max}_{i} {\{\frac{| C_{i}^{*} - ⟨ C_{i} ⟩ |}{C_{i}}\}}_{i = 1}^{N}$ (where $C_{i}^{*}$ is the empirical value of the i-th constraint, $C_{i}$ ). In the UECM case, we can define two variants of the aforementioned error, i.e.

\begin{matrix} MRDE = & max_{i} {\{\frac{| k_{i}^{*} - ⟨ k_{i} ⟩ |}{k_{i}}\}}_{i = 1}^{N} \end{matrix}

\begin{matrix} MRSE = & max_{i} {\{\frac{| s_{i}^{*} - ⟨ s_{i} ⟩ |}{s_{i}}\}}_{i = 1}^{N} \end{matrix}

(the acronyms standing for Maximum Relative Degree Error and Maximum Relative Strength Error). The reason driving this choice lies in the evidence that, in absolute terms, strengths are affected by a larger numerical error than degrees: this, however, doesn’t necessarily mean that a given algorithm performs poorly, as the magnitude of an error must be always compared with the numerical value of the quantity it refers to - whence the choice of considering relative scores.

The three different ‘stop criteria’ we have considered for each algorithm match the ones adopted for analysing the binary cases, consisting in a condition on the Euclidean norm of the gradient of the likelihood function, i.e. $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ , and in a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e. $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 8}$ . The third condition concerns the maximum number of iterations: after 10000 steps, any of the three algorithms stops.

The results about the performance of our three algorithms are reported in Table 4. Overall, two out of three algorithms (i.e. Newton’s and the quasi-Newton methods) perform very satisfactorily, being accurate, fast and scalable; the third one (i.e. the fixed-point recipe), instead, performs very poorly. Moreover, while Newton’s method stops because the condition on the norm of the likelihood is satisfied, both the quasi-Newton and the fixed-point algorithms are always found to satisfy the limit condition on the number of steps (i.e. they run for 10000 steps and, then, stop).

Table 4.

Performance of Newton’s and the quasi-Newton method to solve the reduced system of equations defining the UECM, on a set of real-world WUNs (of which basic statistics as the total number of nodes, N, the total number of links, L, and the connectance, $c = 2 L / N (N - 1)$ , are provided).

	N	L	c	Newton				Quasi-Newton
	N	L	c	MRDE	MASE	MRSE	Time (s)	MRDE	MASE	MRSE	Time (s)
WTW 90	169	7991	$≃ 0.3$	$≃ 2.6 \cdot 10^{- 10}$	$≃ 5 \cdot 10^{- 6}$	$≃ 2 \cdot 10^{- 10}$	$≃ 0.4$	$≃ 7.8 \cdot 10^{- 4}$	$≃ 2 \cdot 10^{- 1}$	$≃ 3 \cdot 10^{- 4}$	$≃ 25$
WTW 91	184	8712	$≃ 0.3$	$≃ 2.1 \cdot 10^{- 10}$	$≃ 1 \cdot 10^{- 10}$	$≃ 1.2 \cdot 10^{- 10}$	$≃ 0.5$	$≃ 9.2 \cdot 10^{- 5}$	$≃ 7 \cdot 10^{- 2}$	$≃ 6.6 \cdot 10^{- 5}$	$≃ 28$
WTW 92	185	8928	$≃ 0.3$	$≃ 1.2 \cdot 10^{- 10}$	$≃ 7 \cdot 10^{- 7}$	$≃ 1.3 \cdot 10^{- 10}$	$≃ 0.5$	$≃ 7 \cdot 10^{- 2}$	$≃ 2 \cdot 10^{- 4}$	$≃ 1.3 \cdot 10^{- 4}$	$≃ 28$
WTW 93	187	9220	$≃ 0.3$	$≃ 3.4 \cdot 10^{- 6}$	$≃ 2 \cdot 10^{- 10}$	$≃ 2.2 \cdot 10^{- 10}$	$≃ 0.5$	$≃ 1.4 \cdot 10^{- 4}$	$≃ 1 \cdot 10^{- 1}$	$≃ 7.6 \cdot 10^{- 5}$	$≃ 28$
WTW 94	187	9437	$≃ 0.3$	$≃ 1.8 \cdot 10^{- 10}$	$≃ 2 \cdot 10^{- 10}$	$≃ 1.8 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 7 \cdot 10^{- 5}$	$≃ 2 \cdot 10^{- 1}$	$≃ 1.5 \cdot 10^{- 4}$	$≃ 28$
WTW 95	187	9578	$≃ 0.3$	$≃ 2.8 \cdot 10^{- 10}$	$≃ 3 \cdot 10^{- 6}$	$≃ 2.9 \cdot 10^{- 10}$	$≃ 0.6$	$≃ 2.5 \cdot 10^{- 4}$	$≃ 3 \cdot 10^{- 1}$	$≃ 6.7 \cdot 10^{- 5}$	$≃ 28$
WTW 96	187	10002	$≃ 0.3$	$≃ 1.4 \cdot 10^{- 5}$	$≃ 1 \cdot 10^{- 10}$	$≃ 1.1 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 1.7 \cdot 10^{- 5}$	$≃ 2 \cdot 10^{- 1}$	$≃ 3 \cdot 10^{- 5}$	$≃ 28$
WTW 97	187	10251	$≃ 0.3$	$≃ 1.1 \cdot 10^{- 5}$	$≃ 4 \cdot 10^{- 10}$	$≃ 6.7 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 4.4 \cdot 10^{- 5}$	$≃ 8 \cdot 10^{- 2}$	$≃ 1.6 \cdot 10^{- 4}$	$≃ 28$
WTW 98	187	10254	$≃ 0.3$	$≃ 1 \cdot 10^{- 5}$	$≃ 3 \cdot 10^{- 10}$	$≃ 4 \cdot 10^{- 10}$	$≃ 0.6$	$≃ 1.7 \cdot 10^{- 4}$	$≃ 8 \cdot 10^{- 2}$	$≃ 5.3 \cdot 10^{- 5}$	$≃ 28$
WTW 99	187	10252	$≃ 0.3$	$≃ 4.7 \cdot 10^{- 10}$	$≃ 1 \cdot 10^{- 10}$	$≃ 8 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 1.6 \cdot 10^{- 4}$	$≃ 7 \cdot 10^{- 2}$	$≃ 6.2 \cdot 10^{- 5}$	$≃ 28$
WTW 00	187	10252	$≃ 0.3$	$≃ 5 \cdot 10^{- 10}$	$≃ 2 \cdot 10^{- 10}$	$≃ 2.4 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 1.5 \cdot 10^{- 4}$	$≃ 9 \cdot 10^{- 2}$	$≃ 5.4 \cdot 10^{- 5}$	$≃ 29$

Open in a new tab

While Newton’s method stops because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied, the quasi-Newton one always reaches the limit of 10000 steps. The results on accuracy and speed clearly indicate that Newton’s method outperforms the quasi-Newton one. Only the results corresponding to the best choice of initial conditions are reported. The results of the fixed-point recipe are not shown.

For what concerns accuracy, the largest maximum error made by Newton’s method (across all configurations) amounts at $10^{- 10} ≲ {MRDE}_{Newton} ≲ 10^{- 5}$ and ${MRSE}_{Newton} ≳ 10^{- 10}$ ; on the other hand, the largest maximum error made by the quasi-Newton method (across all configurations) amounts at $10^{- 5} ≲ {MRDE}_{Quasi-Newton} ≲ 10^{- 1}$ and $10^{- 5} \leq {MRSE}_{Quasi-Newton} \leq 10^{- 4}$ . For what concerns speed, Newton’s method employs tenths of seconds to achieve convergence on each configuration while the quasi-Newton one always requires tens of seconds (specifically, almost thirty seconds for each considered configuration). The results above indicate that the fastest and two most accurate method is systematically Newton’s one, suggesting that the ‘complexity’ of the model is such that the information encoded into the Hessian matrix cannot be ignored without consequences on the quality of the solution. The fixed-point algorithm, instead, stops after seconds but is affected by errors whose order of systematically magnitude amounts at ${MRDE}_{fixed-point} ≃ 10^{2}$ and $1 ≲ {MRSE}_{fixed-point} ≲ 10^{2}$ .

We also explicitly notice that the MADE basically coincides with the MRDE for all considered configurations, meaning that the largest error, made by the algorithms considered here to solve the UECM, affects the nodes with the lowest degree (i.e. equal to one). On the other hand, strengths are affected by a larger absolute error (i.e. the MASE, defined as $MASE = {max}_{i} {|s_{i}^{*} - ⟨ s_{i} ⟩|}_{i = 1}^{N}$ ) than the degrees: if we calculate the MRSE, however, we realize that the largest errors affect very large strengths - hence being perfectly acceptable. For example, let us consider the WTW in 1993: the MASE amounts at 0.1 but, as the MRSE reveals, it affects a strength of the order of $10^{3}$ .

Lastly, differences in the speed of convergence of the two methods discussed in this section, caused by the choice of a particular set of initial conditions, are observable: the ‘uniform’ prescription outperforms the other ones.

Finally, let us comment on the algorithm to sample the UECM ensemble and that can be compactly achieved by implementing a two-step procedure. Let us look back at the formal expression for the pair-specific probability distribution characterizing the UECM: it induces coefficients reading

\begin{matrix} \{\begin{matrix} 1 - p_{ij}^{UECM}, & w = 0 \\ p_{ij}^{UECM} (1 - e^{- β_{i} - β_{j}}), & w = 1 \\ p_{ij}^{UECM} (e^{- β_{i} - β_{j}}) (1 - e^{- β_{i} - β_{j}}), & w = 2 \\ p_{ij}^{UECM} {(e^{- β_{i} - β_{j}})}^{2} (1 - e^{- β_{i} - β_{j}}), & w = 3 \\ ⋮ \end{matrix}) \end{matrix}

in turn suggesting that, for a specific pair of vertices i, j (with $i < j$ ), the appearance of the first link is ruled by a Bernoulli distribution with probability $p_{ij}^{UECM}$ while the remaining $(w - 1)$ ones can be drawn from a geometric distribution whose parameter reads $e^{- β_{i} - β_{j}}$ ; in other words, the weight $(w - 1)$ is drawn conditionally on the presence of a connection between the two considered nodes. The computational complexity of the sampling process is, again, $O (N^{2})$ . The pseudo-code for explicitly sampling the DBCM ensemble is summed up by Algorithm 4. Notice that the way our sampling procedure is written requires the support of the geometric distribution to coincide with the positive integers. graphic file with name 41598_2021_93830_Figd_HTML.jpg

DECM: weighted directed graphs with given strengths and degrees

Let us now extend the ‘mixed’ model introduced in the previous section to the case of directed networks. Constraints are, now, represented by four sequences of values, i.e. ${k_{i}^{out}}_{i = 1}^{N}$ , ${k_{i}^{in}}_{i = 1}^{N}$ , ${s_{i}^{out}}_{i = 1}^{N}$ , ${s_{i}^{in}}_{i = 1}^{N}$ where the generic out-degree and in-degree are, respectively, defined as $k_{i}^{out} (A) = \sum_{j (\neq i) = 1}^{N} a_{ij}$ and $k_{i}^{in} (A) = \sum_{j (\neq i) = 1}^{N} a_{ji}$ and analogously for the generic out-strength and in-strength, reading $s_{i}^{out} (W) = \sum_{j (\neq i) = 1}^{N} w_{ij}$ and $s_{i}^{in} (W) = \sum_{j (\neq i) = 1}^{N} w_{ji}$ . Consistency requires that $A \equiv Θ [W]$ as for the UECM case. This model is known as Directed Enhanced Configuration Model (DECM) and its Hamiltonian reads

\begin{matrix} H_{DECM} (W, \vec{α}, \vec{β}, \vec{γ}, \vec{δ}) = & \sum_{i = 1}^{N} [α_{i} k_{i}^{out} (A) + β_{i} k_{i}^{in} (A) \\ + γ_{i} s_{i}^{out} (W) + δ_{i} s_{i}^{in} (W)] \end{matrix}

in turn, inducing the directed counterpart of the UECM distribution, i.e.

\begin{matrix} Q_{DECM} (W | \vec{α}, \vec{β}, \vec{γ}, \vec{δ}) = \prod_{i = 1}^{N} \prod_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} q_{ij} (w_{ij}) \end{matrix}

with

\begin{matrix} q_{ij} (w) = \{\begin{matrix} 1 - p_{ij}^{DECM} & w = 0 \\ p_{ij}^{DECM} {(e^{- γ_{i} - δ_{j}})}^{w - 1} (1 - e^{- γ_{i} - δ_{j})} & w > 0 \end{matrix}) \end{matrix}

for any two nodes i and j such that $i \neq j$ and $p_{ij}^{DECM} = \frac{e^{- α_{i} - β_{j} - γ_{i} - δ_{j}}}{1 - e^{- γ_{i} - δ_{j}} + e^{- α_{i} - β_{j} - γ_{i} - δ_{j}}}$ . As for the undirected case, weights are required to assume only (non-negative) integer values (i.e. $w_{ij} \in [0, + \infty)$ , $\forall i \neq j$ ): hence, the canonical ensemble is constituted by the weighted configurations with N nodes and a number of (directed) links ranging between zero and the maximum value $N (N - 1)$ .

The argument of the problem (14) for the specific network $W^{*}$ becomes

\begin{matrix} L_{DECM} (\vec{α}, \vec{β}, \vec{γ}, \vec{δ}) = - & \sum_{i = 1}^{N} [α_{i} k_{i}^{out} (A^{*}) + β_{i} k_{i}^{in} (A^{*}) \\ + & γ_{i} s_{i}^{out} (W^{*}) + δ_{i} s_{i}^{in} (W^{*})] \\ - & \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} ln z_{ij} \end{matrix}

where $z_{ij} = [1 + e^{- α_{i} - β_{j}} (\frac{e^{- γ_{i} - δ_{j}}}{1 - e^{- γ_{i} - δ_{j}}})]$ , $\forall i \neq j$ and whose first-order optimality conditions read

\begin{matrix} \nabla_{α_{i}} L_{DECM} = & - k_{i}^{out} (A^{*}) + \sum_{j (\neq i) = 1}^{N} p_{ij}^{DECM} \\ = & - k_{i}^{out} (A^{*}) + ⟨ k_{i}^{out} ⟩ = 0, i = 1 \dots N, \\ \nabla_{β_{i}} L_{DECM} = & - k_{i}^{in} (A^{*}) + \sum_{j (\neq i) = 1}^{N} p_{ji}^{DECM} \\ = & - k_{i}^{in} (A^{*}) + ⟨ k_{i}^{in} ⟩ = 0, i = 1 \dots N, \\ \nabla_{γ_{i}} L_{DECM} = & - s_{i}^{out} (W^{*}) + \sum_{j (\neq i) = 1}^{N} \frac{p_{ij}^{DECM}}{1 - e^{- γ_{i} - δ_{j}}} \\ = & - s_{i}^{out} (W^{*}) + ⟨ s_{i}^{out} ⟩ = 0, i = 1 \dots N, \\ \nabla_{δ_{i}} L_{DECM} = & - s_{i}^{in} (W^{*}) + \sum_{j (\neq i) = 1}^{N} \frac{p_{ji}^{DECM}}{1 - e^{- γ_{j} - δ_{i}}} \\ = & - s_{i}^{in} (W^{*}) + ⟨ s_{i}^{in} ⟩ = 0, i = 1 \dots N . \end{matrix}

Resolution of the DECM. Newton’s and the quasi-Newton methods can be easily implemented via the recipe defined in Eq. (18) (see “Appendix A” for the definition of the DECM Hessian).

As for the UECM, the fixed-point recipe for solving the DECM first-order optimality conditions transforms the following set of consistency equations

\begin{matrix} α_{i} = & - ln [\frac{k_{i}^{out} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- β_{j} - γ_{i} - δ_{j}}}{1 - e^{- γ_{i} - δ_{j}} + e^{- α_{i} - β_{j} - γ_{i} - δ_{j}}})}], \\ β_{i} = & - ln [\frac{k_{i}^{in} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{j} - γ_{j} - δ_{i}}}{1 - e^{- γ_{j} - δ_{i}} + e^{- α_{j} - β_{i} - γ_{j} - δ_{i}}})}], \\ γ_{i} = & - ln [\frac{s_{i}^{out} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{i} - β_{j} - δ_{j}}}{(1 - e^{- γ_{i} - δ_{j}}) (1 - e^{- γ_{i} - δ_{j}} + e^{- α_{i} - β_{j} - γ_{i} - δ_{j}})})}], \\ δ_{i} = & - ln [\frac{s_{i}^{in} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- α_{j} - β_{i} - γ_{j}}}{(1 - e^{- γ_{j} - δ_{i}}) (1 - e^{- γ_{j} - δ_{i}} + e^{- α_{j} - β_{i} - γ_{j} - δ_{i}}})}] . \end{matrix}

(with $i = 1 \dots N$ ) into the usual iterative fashion, by considering the parameters at the left hand side and at the right hand side, respectively at the n-th and at the $(n - 1)$ -th iteration. The reduced version of such a recipe would assign the same set of values $(α, β, γ, δ)$ to the nodes for which the quantities $(k^{out}, k^{in}, s^{out}, s^{in})$ have the same value: however, the larger heterogeneity of the strengths causes the DECM to be much less reducible than the purely binary models we have considered in the present contribution.

The three different sets of initial conditions that have been considered generalize the UECM ones: in particular, the first set of values reads $α_{i}^{(0)} = - ln [\frac{k_{i}^{out} (A^{*})}{\sqrt{L}}]$ , $i = 1 \dots N$ , $β_{i}^{(0)} = - ln [\frac{k_{i}^{in} (A^{*})}{\sqrt{L}}]$ , $i = 1 \dots N$ , $γ_{i}^{(0)} = - ln [\frac{s_{i}^{out} (W^{*})}{\sqrt{W}}]$ , $i = 1 \dots N$ and $δ_{i}^{(0)} = - ln [\frac{s_{i}^{in} (W^{*})}{\sqrt{W}}]$ , $i = 1 \dots N$ ; the second set of initial conditions can be obtained by simply replacing L with N; the third recipe, as usual, prescribes to randomly draw the value of each parameter from the uniform distribution defined on the unit interval.

Performance testing The performance of the three algorithms to solve the system of equations defining the DECM has been tested on a bunch of real-world networks. In particular, we have considered the Electronic Italian Interbank Market (e-MID) during the decade 2000–2010⁴². Since e-MID weights are real numbers, we have rounded them to the nearest integer value, before running the DECM. Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out

The accuracy of each algorithm in reproducing the constraints defining the DECM has been quantified via the maximum relative error metrics, now reading

\begin{matrix} MRDE = & max_{i} {\{\frac{| k_{i}^{*} - ⟨ k_{i} ⟩ |}{k_{i}}, \frac{| h_{i}^{*} - ⟨ h_{i} ⟩ |}{h_{i}}\}}_{i = 1}^{N} \end{matrix}

\begin{matrix} MRSE = & max_{i} {\{\frac{| s_{i}^{*} - ⟨ s_{i} ⟩ |}{s_{i}}, \frac{| t_{i}^{*} - ⟨ t_{i} ⟩ |}{t_{i}}\}}_{i = 1}^{N} \end{matrix}

(the acronyms standing for for Maximum Relative Degree Error and Maximum Relative Strength Error) where we have defined $k^{out} \equiv k$ , $k^{in} \equiv h$ , $s^{out} \equiv s$ and $s^{in} \equiv t$ in order to simplify the formalism.

The three different ‘stop criteria’ we have adopted are the same ones we have considered for both the binary and the undirected, ‘mixed’ model, i.e. the condition on the Euclidean norm of the gradient of the likelihood function, i.e. $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ ), the condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations (i.e. $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 8}$ ) and the condition on the maximum number of iterations (i.e. after 10000 steps, any of the three algorithms stops).

The results about the performance of our three algorithms are reported in Table 5. Overall, Newton’s method performs very satisfactorily, being accurate, fast and scalable; the quasi-Newton method is accurate as well although (in some cases, much) slower. The fixed-point recipe, instead, performs very poorly, as for the undirected case. Moreover, while Newton’s method stops because the condition on the norm of the likelihood is satisfied, both the quasi-Newton and the fixed-point algorithms are always found to satisfy the limit condition on the number of steps (i.e. they run for 10,000 steps and, then, stop).

Table 5.

Performance of Newton’s and the quasi-Newton method to solve the reduced system of equations defining the DECM, on a set of real-world WDNs (of which basic statistics as the total number of nodes, N, the total number of links, L, and the connectance, $c = L / N (N - 1)$ , are provided).

	N	L	c	Newton				Quasi-Newton
	N	L	c	MRDE	MASE	MRSE	Time (s)	MRDE	MASE	MRSE	Time (s)
e-MID 00	196	10618	$≃ 0.28$	$≃ 1.4 \cdot 10^{- 10}$	$≃ 5 \cdot 10^{- 9}$	$≃ 1.5 \cdot 10^{- 10}$	$≃ 0.5$	$≃ 1.7 \cdot 10^{- 7}$	$≃ 3 \cdot 10^{- 1}$	$≃ 5.4 \cdot 10^{- 7}$	$≃ 0.1$
e-MID 01	185	8951	$≃ 0.26$	$≃ 1.4 \cdot 10^{- 11}$	$≃ 6 \cdot 10^{- 9}$	$≃ 2 \cdot 10^{- 10}$	$≃ 0.4$	$≃ 1.4 \cdot 10^{- 7}$	$≃ 10$	$≃ 7 \cdot 10^{- 5}$	$≃ 0.2$
e-MID 02	177	7252	$≃ 0.23$	$≃ 1.4 \cdot 10^{- 15}$	$≃ \cdot 10^{- 4}$	$≃ 1 \cdot 10^{- 5}$	$≃ 0.5$	$≃ 9.5 \cdot 10^{- 8}$	$≃ 6 \cdot 10^{- 1}$	$≃ 7.4 \cdot 10^{- 6}$	$≃ 0.1$
e-MID 03	179	6814	$≃ 0.21$	$≃ 1.6 \cdot 10^{- 10}$	$≃ 2 \cdot 10^{- 5}$	$≃ 4.4 \cdot 10^{- 10}$	$≃ 0.9$	$≃ 9.6 \cdot 10^{- 8}$	$≃ 50$	$≃ 1.1 \cdot 10^{- 3}$	$≃ 0.2$
e-MID 04	180	6136	$≃ 0.19$	$≃ 6.5 \cdot 10^{- 13}$	$≃ 9 \cdot 10^{- 7}$	$≃ 3.4 \cdot 10^{- 12}$	$≃ 0.9$	$≃ 1 \cdot 10^{- 7}$	$≃ 700$	$≃ 4.2 \cdot 10^{- 3}$	$≃ 0.2$
e-MID 05	176	6203	$≃ 0.2$	$≃ 3 \cdot 10^{- 12}$	$≃ 1 \cdot 10^{- 5}$	$≃ 9.4 \cdot 10^{- 11}$	$≃ 1.2$	$≃ 4.8 \cdot 10^{- 8}$	$≃ 300$	$≃ 2.4 \cdot 10^{- 3}$	$≃ 0.3$
e-MID 06	177	6132	$≃ 0.19$	$≃ 1.5 \cdot 10^{- 14}$	$≃ 2 \cdot 10^{- 7}$	$≃ 1.8 \cdot 10^{- 11}$	$≃ 0.9$	$≃ 5 \cdot 10^{- 8}$	$≃ 60$	$≃ 2.5 \cdot 10^{- 3}$	$≃ 0.2$
e-MID 07	178	6330	$≃ 0.2$	$≃ 8.4 \cdot 10^{- 15}$	$≃ 1 \cdot 10^{- 4}$	$≃ 2.5 \cdot 10^{- 6}$	$≃ 0.7$	$≃ 1.8 \cdot 10^{- 8}$	$≃ 3$	$≃ 7.6 \cdot 10^{- 5}$	$≃ 0.1$
e-MID 08	173	4767	$≃ 0.16$	$≃ 2.3 \cdot 10^{- 9}$	$≃ 2 \cdot 10^{- 9}$	$≃ 7.7 \cdot 10^{- 10}$	$≃ 0.6$	$≃ 1.8 \cdot 10^{- 8}$	$≃ 20$	$≃ 1.2 \cdot 10^{- 3}$	$≃ 0.3$
e-MID 09	156	2961	$≃ 0.12$	$≃ 2.6 \cdot 10^{- 15}$	$≃ 1 \cdot 10^{- 7}$	$≃ 4.9 \cdot 10^{- 12}$	$≃ 0.6$	$≃ 1.7 \cdot 10^{- 8}$	$≃ 1$	$≃ 9 \cdot 10^{- 5}$	$≃ 0.1$
e-MID 10	135	2743	$≃ 0.15$	$≃ 6.3 \cdot 10^{- 8}$	$≃ 3 \cdot 10^{- 6}$	$≃ 5.9 \cdot 10^{- 10}$	$≃ 0.7$	$≃ 1.6 \cdot 10^{- 8}$	$≃ 4$	$≃ 5.2 \cdot 10^{- 5}$	$≃ 0.1$

Open in a new tab

For what concerns accuracy, the largest maximum error made by Newton’s method (across all configurations) amounts at $10^{- 14} ≲ {MRDE}_{Newton} ≲ 10^{- 7}$ and $10^{- 12} ≲ {MRSE}_{Newton} ≲ 10^{- 5}$ ; on the other hand, the largest maximum error made by the quasi-Newton method (across all configurations) amounts at $10^{- 8} ≲ {MRDE}_{Quasi-Newton} ≲ 10^{- 7}$ and $10^{- 4} ≲ {MRSE}_{Quasi-Newton} ≲ 10^{- 3}$ . For what concerns speed, Newton’s method employs tens of seconds to achieve convergence on each configuration; the time required by the quasi-Newton method is of the same order of magnitude, although it is systematically larger than the time required by Newton’s one. Overall, these results indicate that the fastest and two most accurate method is Newton’s one. As in the undirected case, the fixed-point algorithm, instead, stops after seconds but is affected by errors whose order of systematically magnitude amounts at $10 ≲ {MRDE}_{fixed-point} ≲ 10^{2}$ and $1 ≲ {MRSE}_{fixed-point} ≲ 10^{2}$ .

As for the UECM, the MADE basically coincides with the MRDE, for all considered configurations, while strengths are affected by a larger absolute error than the degrees: still, upon calculating the MRSE, we realize that the largest errors affect very large strengths - hence being perfectly acceptable.

Finally, let us comment on the algorithm to sample the DECM ensemble: as for the UECM, it can be compactly achieved by implementing the directed counterpart of the two-step procedure described above. Given a specific pair of vertices i, j (with $i \neq j$ ), the first link can be drawn by sampling a Bernoulli distribution with probability $p_{ij}^{DECM}$ while the remaining $(w - 1)$ ones can be drawn from a geometric distribution whose parameter reads $e^{- γ_{i} - δ_{j}}$ . The computational complexity of the sampling process is, again, $O (N^{2})$ and the pseudo-code for explicitly sampling the DECM ensemble is summed up by Algorithm 5. Notice that the way our sampling procedure is written requires the support of the geometric distribution to coincide with the positive integers. graphic file with name 41598_2021_93830_Fige_HTML.jpg

Two-step models for undirected and directed networks

The need of considering network models defined in a two-step fashion arises from a number of considerations. First, the amount of information concerning binary and weighted quantities is often asymmetric: as it has been pointed out in⁴³, information concerning a given network structure ranges from the knowledge of just a single, aggregated piece of information (e.g. the link density) to that of entire subgraphs. Indeed, models exist that take as input any binary, either probabilistic or deterministic, network model (i.e. any $P (A)$ ) while placing link weights optimally, conditionally on the input configurations^19,43.

Second, recipes like the UECM and the DECM are, generally speaking, difficult to solve; as we have already observed, only Newton’s method performs in a satisfactory way, both for what concerns accuracy and speed: hence, easier-to-solve recipes are welcome.

In what follows, we will consider the conditional reconstruction method (hereby, CReM) induced by the Hamiltonian

\begin{matrix} H_{CReM} (W, \vec{θ}) = & \sum_{i = 1}^{N} θ_{i} s_{i} (A) ; \end{matrix}

in case of undirected networks, it induces a conditional probability distribution reading

\begin{matrix} Q (W | A) = \prod_{i = 1}^{N} \prod_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} q_{ij} (w_{ij} | a_{ij}) \end{matrix}

where, for consistency, $q_{ij} (w_{ij} = 0 | a_{ij} = 0) = 1$ and $q_{ij} (w_{ij} = 0 | a_{ij} = 1) = 0$ . The meaning of these relationships is the following: given any two nodes i and j, the absence of a link, i.e. $a_{ij} = 0$ , admits the only possibility $w_{ij} = 0$ ; on the other hand, the presence of a link, i.e. $a_{ij} = 1$ , rules out the possibility that a null weight among the same vertices is observed.

In general, the functional form of $q_{ij} (w_{ij} | a_{ij} = 1)$ depends on the domain of the weights. In all cases considered in^19,43, weights are assumed to be continuous; since the continuous distribution that maximizes Shannon entropy while constrained to reproduce first-order moments is the exponential one, the following functional form

\begin{matrix} q_{ij} (w_{ij} | a_{ij} = 1) = \{\begin{matrix} (θ_{i} + θ_{j}) e^{- (θ_{i} + θ_{j}) w} & w > 0 \\ 0 & w \leq 0 \end{matrix}) \end{matrix}

(for any undirected pair of nodes) remains naturally induced. As shown in⁴³, the problem (14) has to be slightly generalized; still, its argument for the specific network $W^{*}$ becomes

\begin{matrix} G_{CReM} = - \sum_{i = 1}^{N} θ_{i} s_{i} (W^{*}) + \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} f_{ij} log [θ_{i} + θ_{j}] \end{matrix}

where the quantity $f_{ij} = \sum_{A} P (A) a_{ij}$ represents the expected value of $a_{ij}$ over the ensemble of binary configurations defining the binary model taken as input (i.e. the marginal probability of an edge existing between nodes i and j). It follows that the CReM first-order optimality conditions read

\begin{matrix} \nabla_{θ_{i}} L_{CReM} = & - s_{i} (W^{*}) + \sum_{j (\neq i) = 1}^{N} \frac{f_{ij}}{θ_{i} + θ_{j}} \\ = & - s_{i} (W^{*}) + ⟨ s_{i} ⟩ = 0, i = 1 \dots N . \end{matrix}

Resolution of the CReM Newton’s and the quasi-Newton method can still be implemented via the recipe defined in Eq. (18) (see “Appendix A” for the definition of the CReM Hessian).

As for the UECM and the DECM, the fixed-point recipe for solving the system of equations embodying the CReM transforms the set of consistency equations

\begin{matrix} θ_{i} = {[\frac{s_{i} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{f_{ij}}{1 + θ_{j} / θ_{i}})}]}^{- 1}, i = 1 \dots N \end{matrix}

into an iterative recipe of the usual form, i.e. by considering the parameters at the left hand side and at the right hand side, respectively at the n-th and at the $(n - 1)$ -th iteration. Although a reduced recipe can, in principle, be defined, an analogous observation to the one concerning the UECM and the DECM holds: the mathematical nature of the strengths (now, real numbers) increases their heterogeneity, in turn causing the CReM algorithm to be reducible even less than the ‘mixed’ models defined by discrete weights.

The initialization of the iterative recipe for solving the CReM has been implemented in the usual threefold way. The first set of initial values reads $θ_{i}^{(0)} = - ln [\frac{s_{i} (W^{*})}{\sqrt{2 W}}]$ , $i = 1 \dots N$ ; the second one is a variant of the position above, reading $θ_{i}^{(0)} = - ln [\frac{s_{i} (W^{*})}{\sqrt{N}}]$ ; the third one, instead, prescribes to randomly draw the value of each parameter from the uniform distribution defined on the unit interval, i.e. $θ_{i}^{(0)} \sim U (0, 1)$ , $\forall i$ .

When considering directed networks, the conditional probability distribution defining the CReM reads

\begin{matrix} q_{ij} (w_{ij} | a_{ij} = 1) = \{\begin{matrix} (α_{i} + β_{j}) e^{- (α_{i} + β_{j}) w} & w > 0 \\ 0 & w \leq 0 \end{matrix}) \end{matrix}

for any two nodes i and j such that $i \neq j$ ; the set of Eq. (73) can be generalized as follows

\begin{matrix} α_{i} = {[\frac{s_{i}^{out} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{f_{ij}}{1 + β_{j} / α_{i}})}]}^{- 1}, i = 1 \dots N \\ β_{i} = {[\frac{s_{i}^{in} (W^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{f_{ji}}{1 + α_{j} / β_{i}})}]}^{- 1}, i = 1 \dots N \end{matrix}

and analogously for the sets of values initializing them.

Rescaling the CReM algorithm. Although the equations defining the CReM algorithm cannot be effectively reduced, they can be opportunely rescaled. To this aim, let us consider directed configurations and the system

\begin{matrix} \{\begin{matrix} \sum_{\begin{matrix} j = 1 \\ j (\neq i) \end{matrix}}^{N} \frac{f_{ij}}{α_{i} (κ) + β_{j} (κ)} & = \frac{s_{i}^{out} (W^{*})}{κ}, \forall i \\ \sum_{\begin{matrix} j = 1 \\ j (\neq i) \end{matrix}}^{N} \frac{f_{ji}}{α_{j} (κ) + β_{i} (κ)} & = \frac{s_{i}^{in} (W^{*})}{κ}, \forall i \end{matrix}) \end{matrix}

where the sufficient statistics has been divided by an opportunely defined factor (in this case, $κ$ ) and the symbols $α_{i} (κ)$ , $α_{j} (κ)$ , $β_{i} (κ)$ and $β_{j} (κ)$ stress that the solution we are searching for is a function of the parameter $κ$ itself. In fact, a solution of the system above reads

\begin{matrix} α_{i}^{*} = & \frac{α_{i}^{*} (κ)}{κ}, \forall i \end{matrix}

\begin{matrix} β_{i}^{*} = & \frac{β_{i}^{*} (κ)}{κ}, \forall i \end{matrix}

as it can be proven upon substituting it back into Eq. (76) and noticing that ${α_{i}^{*}}_{i = 1}^{N}$ and ${β_{i}^{*}}_{i = 1}^{N}$ are solutions of the system of equations defined in (76). As our likelihood maximization problem admits a unique, global maximum, the prescription above allows us to easily identify it. Rescaling will be tested in order to find out if our algorithms are enhanced by it under some respect (e.g. accuracy or speed).

Performance testing Before commenting on the performance of the three algorithms in solving the system of equations defining the CReM, let us stress once more that the formulas presented so far are perfectly general, working for any binary recipe one may want to employ. In what follows, we will test the CReM by posing $f_{ij} \equiv p_{ij}^{UBCM}$ and $f_{ij} \equiv p_{ij}^{DBCM}$ .

To test the effectiveness of our algorithms in solving the CReM on undirected networks we have considered the synaptic network of the worm C. Elegans²⁸ and the eight daily snapshots of the Bitcoin Lightning Network³²; the directed version of the CReM has, instead, been solved on the Electronic Italian Interbank Market (e-MID) during the decade 2000-2010⁴². Before commenting on the results of our numerical exercises, let us, first, describe how the latter ones have been carried out.

As for the discrete ‘mixed’ models, the accuracy of each algorithm in reproducing the constraints defining the CReM has been quantified via the Maximum Relative Degree Error and the Maximum Relative Strength Error metrics, whose definition is provided by Eqs. (57), (58) and (66), (67) for the undirected and the directed case, respectively. Analogously, the three ‘stop criteria’ for each algorithm are the same ones that we have adopted for the other models (and consist in a condition on the Euclidean norm of the gradient of the likelihood function, i.e. $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ , a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e. $| | Δ \vec{θ} {| |}_{2} \leq 10^{- 8}$ , and a condition on the maximum number of iterations, i.e. 10,000 steps).

The results about the performance of our three algorithms are reported in Tables 6 and 7. Let us start by commenting the results reported in Table 6 and concerning undirected networks. Generally speaking, Newton’s method is the most accurate one (its largest maximum errors span intervals, across all configurations, that amount at $10^{- 11} ≲ {MRDE}_{Newton} ≲ 10^{- 8}$ and $10^{- 5} ≲ {MRSE}_{Newton} ≲ 10^{- 4}$ ) although it scales very badly with the size of the network on which it is tested (the amount of time, measured in seconds, required by it to achieve convergence spans an interval, across all configurations, that amounts at $0.08 \leq T_{Newton}^{reduced} \leq 1188)$ .

Table 6.

Performance of Newton’s, quasi-Newton and the fixed-point algorithm to solve the system of equations defining the (undirected version of the) CReM, on a set of real-world WUNs. While Newton’s and the fixed-point method stop because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied, the quasi-Newton one often reaches the limit of 10,000 steps.

	Newton				Quasi-Newton				Fixed-point
	MRDE	MASE	MRSE	Time (s)	MRDE	MASE	MRSE	Time (s)	MRDE	MASE	MRSE	Time (s)
${BLN}_{1}$	$≃ 4 \cdot 10^{- 8}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 0.08$	$≃ 6 \cdot 10^{- 8}$	$≃ 10^{- 4}$	$≃ 10^{- 6}$	$≃ 5$	$≃ 2 \cdot 10^{- 9}$	$≃ 10^{- 2}$	$≃ 10^{- 1}$	$≃ 0.01$
${BLN}_{2}$	$≃ 2 \cdot 10^{- 8}$	$≃ 10^{- 8}$	$≃ 10^{- 5}$	$≃ 3.2$	$≃ 9 \cdot 10^{- 8}$	$≃ 10^{- 4}$	$≃ 10^{- 1}$	$≃ 100$	$≃ 1 \cdot 10^{- 9}$	$≃ 10^{- 2}$	$≃ 10^{- 1}$	$≃ 0.73$
${BLN}_{3}$	$≃ 1 \cdot 10^{- 8}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 14$	$≃ 1 \cdot 10^{- 7}$	$≃ 10^{- 4}$	$≃ 10^{- 2}$	$≃ 388$	$≃ 1 \cdot 10^{- 9}$	$≃ 10^{- 7}$	$≃ 10^{- 6}$	$≃ 11$
${BLN}_{4}$	$≃ 2 \cdot 10^{- 8}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 71$	$≃ 5 \cdot 10^{- 8}$	$≃ 10^{- 2}$	$≃ 3$	$≃ 1538$	$≃ 9 \cdot 10^{- 10}$	$≃ 20$	$≃ 6 \cdot 10^{- 1}$	$≃ 1.3$
${BLN}_{5}$	$≃ 2 \cdot 10^{- 9}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 200$	$≃ 4 \cdot 10^{- 7}$	$≃ 10^{- 1}$	$≃ 6$	$≃ 3633$	$≃ 6 \cdot 10^{- 10}$	$≃ 10^{- 7}$	$≃ 10^{- 8}$	$≃ 5.7$
${BLN}_{6}$	$≃ 2 \cdot 10^{- 9}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 382$	$≃ 3 \cdot 10^{- 8}$	$≃ 10^{- 2}$	$≃ 3$	$≃ 5980$	$≃ 6 \cdot 10^{- 10}$	$≃ 10^{- 3}$	$≃ 10^{- 3}$	$≃ 550$
${BLN}_{7}$	$≃ 5 \cdot 10^{- 8}$	$≃ 10^{- 9}$	$≃ 10^{- 4}$	$≃ 648$	$≃ 4 \cdot 10^{- 7}$	$≃ 10^{- 2}$	$≃ 2$	$≃ 10177$	$≃ 5 \cdot 10^{- 10}$	$≃ 10^{- 2}$	$≃ 10^{- 1}$	$≃ 36$
${BLN}_{8}$	$≃ 5 \cdot 10^{- 12}$	$≃ 10^{- 10}$	$≃ 10^{- 6}$	$≃ 1188$	$≃ 3 \cdot 10^{- 7}$	$≃ 10^{- 4}$	$≃ 10^{- 1}$	$≃ 15888$	$≃ 5 \cdot 10^{- 10}$	$≃ 10^{- 2}$	$≃ 10^{- 1}$	$≃ 70$

Open in a new tab

The results on accuracy and speed indicate that Newton’s and the fixed-point method compete, outperforming the quasi-Newton one. Only the results corresponding to the best choice of initial conditions are reported.

Table 7.

Performance of Newton’s, quasi-Newton and the fixed-point algorithm to solve the system of equations defining the (directed version of the) CReM, on a set of real-world WDNs. All algorithms stop because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied.

	N	L	Newton			Quasi-Newton			Fixed-point
	N	L	MRDE	MRSE	Time (s)	MRDE	MRSE	Time (s)	MRDE	MRSE	Time (s)
e-MID 00	196	10618	$≃ 3 \cdot 10^{- 7}$	$≃ 2 \cdot 10^{- 10}$	$≃ 0.9$	$≃ 5 \cdot 10^{- 7}$	$≃ 2 \cdot 10^{- 6}$	$≃ 0.9$	$≃ 4 \cdot 10^{- 14}$	$≃ 8 \cdot 10^{- 5}$	$≃ 0.09$
e-MID 01	185	8951	$≃ 3 \cdot 10^{- 15}$	$≃ 1 \cdot 10^{- 10}$	$≃ 0.9$	$≃ 7 \cdot 10^{- 10}$	$≃ 5 \cdot 10^{- 6}$	$≃ 1$	$≃ 7 \cdot 10^{- 9}$	$≃ 1 \cdot 10^{- 4}$	$≃ 0.12$
e-MID 02	177	7252	$≃ 6 \cdot 10^{- 9}$	$≃ 2 \cdot 10^{- 13}$	$≃ 0.7$	$≃ 3 \cdot 10^{- 8}$	$≃ 5 \cdot 10^{- 6}$	$≃ 1$	$≃ 8 \cdot 10^{- 6}$	$≃ 3 \cdot 10^{- 1}$	$≃ 0.08$
e-MID 03	179	6814	$≃ 1 \cdot 10^{- 12}$	$≃ 2 \cdot 10^{- 10}$	$≃ 0.8$	$≃ 3 \cdot 10^{- 9}$	$≃ 3 \cdot 10^{- 3}$	$≃ 0.7$	$≃ 4 \cdot 10^{- 15}$	$≃ 8 \cdot 10^{- 4}$	$≃ 0.1$
e-MID 04	180	6136	$≃ 9 \cdot 10^{- 10}$	$≃ 2 \cdot 10^{- 7}$	$≃ 0.9$	$≃ 5 \cdot 10^{- 15}$	$≃ 8 \cdot 10^{- 5}$	$≃ 0.8$	$≃ 5 \cdot 10^{- 9}$	$≃ 6 \cdot 10^{- 4}$	$≃ 0.13$
e-MID 05	176	6203	$≃ 3 \cdot 10^{- 10}$	$≃ 5 \cdot 10^{- 9}$	$≃ 0.7$	$≃ 2 \cdot 10^{- 14}$	$≃ 2 \cdot 10^{- 3}$	$≃ 0.7$	$≃ 5 \cdot 10^{- 9}$	$≃ 1 \cdot 10^{- 3}$	$≃ 0.2$
e-MID 06	177	6132	$≃ 1 \cdot 10^{- 10}$	$≃ 7 \cdot 10^{- 12}$	$≃ 0.7$	$≃ 2 \cdot 10^{- 13}$	$≃ 3 \cdot 10^{- 3}$	$≃ 0.8$	$≃ 8 \cdot 10^{- 11}$	$≃ 5 \cdot 10^{- 1}$	$≃ 0.14$
e-MID 07	178	6330	$≃ 3 \cdot 10^{- 6}$	$≃ 3 \cdot 10^{- 13}$	$≃ 1$	$≃ 1 \cdot 10^{- 7}$	$≃ 8 \cdot 10^{- 6}$	$≃ 1.2$	$≃ 7 \cdot 10^{- 12}$	$≃ 7 \cdot 10^{- 1}$	$≃ 0.14$
e-MID 08	173	4767	$≃ 3 \cdot 10^{- 10}$	$≃ 8 \cdot 10^{- 13}$	$≃ 0.8$	$≃ 1 \cdot 10^{- 9}$	$≃ 1 \cdot 10^{- 3}$	$≃ 0.7$	$≃ 3 \cdot 10^{- 9}$	$≃ 8 \cdot 10^{- 1}$	$≃ 0.07$
e-MID 09	156	2961	$≃ 4 \cdot 10^{- 11}$	$≃ 3 \cdot 10^{- 12}$	$≃ 0.6$	$≃ 1 \cdot 10^{- 7}$	$≃ 9 \cdot 10^{- 5}$	$≃ 0.7$	$≃ 8 \cdot 10^{- 10}$	$≃ 2 \cdot 10^{- 3}$	$≃ 0.11$
e-MID 10	135	2743	$≃ 2 \cdot 10^{- 11}$	$≃ 2 \cdot 10^{- 9}$	$≃ 0.7$	$≃ 7 \cdot 10^{- 13}$	$≃ 5 \cdot 10^{- 5}$	$≃ 0.5$	$≃ 5 \cdot 10^{- 9}$	$≃ 2 \cdot 10^{- 1}$	$≃ 0.05$

Open in a new tab

For what concerns accuracy, the two most accurate methods are Newton’s and the quasi-Newton one; for what concerns speed, the fastest method is the fixed-point one. Only the results corresponding to the best choice of initial conditions are reported.

The quasi-Newton method, on the other hand, is very accurate on the degrees (as already observed in the UBCM case) but not so accurate in reproducing the weighted constraints (its largest maximum errors span intervals, across all configurations, that amount at ${MRDE}_{Quasi-Newton} ≃ 10^{- 7}$ and $10^{- 6} ≲ {MRSE}_{Quasi-Newton} ≲ 6$ ). Moreover, it scales even worse than Newton’s method with the size of the network on which it is tested (the amount of time, measured in seconds, required by it to achieve convergence spans an interval, across all configurations, that amounts at $5 \leq T_{Quasi-Newton} \leq 15888$ ).

The performance of the fixed-point recipe is, somehow, intermediate between that of Newton’s and the quasi-Newton method. For what concerns accuracy, it is more accurate in reproducing the binary constraints than in reproducing the weighted ones (its largest maximum errors span intervals, across all configurations, that amount at ${MRDE}_{fixed-point} ≃ 10^{- 9}$ and $10^{- 8} ≲ {MRSE}_{fixed-point} ≲ 10^{- 1}$ ) although it outperforms Newton’s method, sometimes. For what concerns scalability, the fixed-point method is the less sensitive one to the growing size of the considered configurations: hence, it is also the fastest one (the amount of time, measured in seconds, required by it to achieve convergence spans an interval, across all configurations, that amounts at $0.01 \leq T_{fixed-point} \leq 550$ ).

Moreover, while Newton’s and the fixed-point method stop because the condition on the norm of the likelihood is satisfied, the quasi-Newton method is often found to satisfy the limit condition on the number of steps (i.e. it runs for 10000 steps and, then, stops).

Interestingly, the fact that the CReM cannot be reduced (at least not to a comparable extent with the one characterizing purely binary models) reveals a dependence on the network size of Newton’s and the quasi-Newton algorithms. The reason may lie in the evidence that both Newton’s and the quasi-Newton method require (some proxy of) the Hessian matrix of the system of equations defining the CReM to update the value of the parameters: as already observed, the order of the latter - which is $O (N^{2})$ for Newton’s method and O(N) for the quasi-Newton one - can make its calculation (very) time demanding.

Let us now move to comment on the performance of our algorithms when applied to solve the directed version of the CReM (see Table 7). Overall, all methods perform much better than in the undirected case, stopping because the condition on the norm of the likelihood is satisfied.

In fact, all of them are very accurate in reproducing the purely binary constraints, their largest maximum errors spanning intervals, across all configurations, that amount at $10^{- 12} ≲ {MRDE}_{Newton} ≲ 10^{- 6}$ , $10^{- 14} ≲ {MRDE}_{Quasi-Newton} ≲ 10^{- 6}$ and $10^{- 15} ≲ {MRDE}_{fixed-point} ≲ 10^{- 8}$ ; for what concerns the weighted constraints, instead, the two most accurate methods are Newton’s and the quasi-Newton one, their largest maximum errors spanning intervals, across all configurations, that amount at $10^{- 13} ≲ {MRSE}_{Newton} ≲ 10^{- 7}$ and $10^{- 6} ≲ {MRSE}_{Quasi-Newton} ≲ 10^{- 3}$ (the fixed-point method performs worse than them, since $10^{- 3} ≲ {MRSE}_{fixed-point} ≲ 10^{- 1}$ ).

For what concerns speed, the amount of time, measured in seconds, required by Newton’s, the quasi-Newton and the fixed-point algorithms to achieve convergence spans an interval, across all configurations, that amounts at $0.6 \leq T_{Newton}^{reduced} \leq 1$ , $0.5 \leq T_{Quasi-Newton}^{reduced} \leq 1.2$ and $0.05 \leq T_{fixed-point}^{reduced} \leq 0.2$ , respectively. Hence, all methods are also very fast - the fixed-point being systematically the fastest one.

As already stressed above, the fact that the e-MID number of nodes remains approximately constant throughout the considered time interval masks the strong dependence of the performance of Newton’s and the quasi-Newton method on the network size.

Lastly, while rescaling the system of equations defining the CReM improves neither the accuracy nor the speed of any of the three algorithms considered here, differences in their speed of convergence, caused by the choice of a particular set of initial conditions, are observable: the ‘uniform’ prescription outperforms the other ones (for both the undirected and the directed version of the CReM).

As usual, let us comment on the algorithm to sample the CReM ensemble - for the sake of simplicity, on the undirected cae. As for the UECM it can be compactly achieved by implementing a two-step procedure, the only difference lying in the functional form of the distribution from which weights are sampled. Given a specific pair of vertices i, j (with $i < j$ ), the first link can be drawn from a Bernoulli distribution with probability $p_{ij}^{UBCM}$ while the remaining $(w - 1)$ ones can be drawn from an exponential distribution whose parameter reads $θ_{i} + θ_{j}$ . The computational complexity of the sampling process is, again, $O (N^{2})$ and the pseudo-code for explicitly sampling the CReM ensemble is summed up by Algorithm 6. graphic file with name 41598_2021_93830_Figf_HTML.jpg

Discussion

The exercises carried out so far have highlighted a number of (stylized) facts concerning the performance of the three algorithms tested: in what follows, we will briefly sum them up.

Newton’s method Overall, Newton’s method is very accurate - often, the most accurate one - in reproducing both the binary and the weighted constraints; moreover, it represent the only viable alternative when the most complicated models are considered (i.e. the UECM and the DECM, respectively defined by a system of 2N and 4N coupled, non-linear equations). However, the time required to run Newton’s method on a given model seems to be quite dependent on the network size, especially whenever the corresponding system of equations cannot be reduced - see the case of the undirected CReM, run on the Bitcoin Lightning Network. Since one of the reasons affecting the bad scaling of Newton’s method with the network size is the evaluation of the Hessian matrix defining a given model, this algorithm has to be preferred for largely reducible networks.

Quasi-Newton method For all the networks considered here, the quasi-Newton method we have implemented is nothing else than the diagonal version of the traditional Newton’s method. Even if this choice greatly reduces the number of entries of the Hessian matrix which are needed (i.e. just N elements for the undirected version of the CReM, 2N elements for the UECM and the directed version of the CReM and 4N elements for the DECM) dimensionality may still represent an issue to achieve fast convergence. Moreover, since the diagonal approximation of the Hessian matrix is not necessarily always a good one, the quasi-Newton method may require more time than Newton’s one to achieve the same level of accuracy in reproducing the constraints. However, when such an approximation is a good one, the ‘regime’ in which the quasi-Newton method outperforms the competitors seems to be the one of small, non-reducible networks (e.g. see the results concerning the DBCM run on the WTW) - althogh, in cases like these, Newton’s method may still be a strong competitor.

Fixed-point method From a purely theoretical point of view, the fixed-point recipe is the fastest one, since the time required to evaluate the generic n-th step is (only) due to the evaluation of the model-specific map at the $(n - 1)$ -th iteration. Strictly speaking, however, this holds true for a single step: if the number of steps required for convergence is large, in fact, the total amount of time required by the fixed-point method can be large as well. Overall, however, this algorithm has to be preferred for large, non-reducible networks: this is the case of the (undirected version of the) CReM, run on the 8-th snapshot of the Bitcoin Lightning Network (i.e. day 17-07-19) and requiring a little more than one minute to achieve an accuracy of ${MRDE}_{fixed-point} ≳ 10^{- 10}$ and ${MRSE}_{fixed-point} ≃ 10^{- 1}$ ; naturally, the method is not as accurate as Newton’s one, for which ${MRDE}_{Newton} ≳ 10^{- 12}$ and ${MRSE}_{Newton} ≃ 10^{- 6}$ but is much faster as Newton’s algorithm requires $≃ 1188$ seconds to converge.

The ‘NEMTROPY’ Python package As an additional result, we release a comprehensive package, coded in Python, that implements the three aforementioned algorithms on all the ERGMs considered in the present work. Its name is ‘NEMTROPY’, an acronym standing for ‘Network Entropy Maximization: a Toolbox Running On Python’, and is freely downloadable at the following URL: https://pypi.org/project/NEMtropy/.

Alternative techniques to improve accuracy and speed have been tested as well, as the one of coupling two of the algorithms considered above. In particular, we have tried to solve the (undirected version of the) CReM by running the fixed-point algorithm and using the solution of the latter as input for the quasi-Newton method. The results are reported in Table 8: as they clearly show, the coupled algorithm is indeed more accurate that the single methods composing it and much faster than the quasi-Newton one (for some snapshots, more accurate and even faster than Newton’s method). Techniques like these are, in general, useful to individuate better initial conditions than the completely random ones: a first run of the fastest method may be, in fact, useful to direct the most accurate algorithm towards the (best) solution. This is indeed the case, upon considering that the quasi-Newton method, now, stops because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied - and not for having reached the limit of 10000 steps.

Table 8.

Performance of the algorithm coupling fixed-point and quasi-Newton to solve the system of equations defining the (undirected version of the) CReM, on a set of real-world WUNs. The algorithm stops because the condition $| | \nabla L (\vec{θ}) {| |}_{2} \leq 10^{- 8}$ is satisfied.

	Fixed-point + Quasi-Newton
	MRDE	MADE	MRSE	Time (s)
BLN 24-01-18	$≃ 2 \cdot 10^{- 9}$	$≃ 1.1 \cdot 10^{- 7}$	$≃ 1 \cdot 10^{- 5}$	$≃ 0.1$
BLN 25-02-18	$≃ 1.3 \cdot 10^{- 9}$	$≃ 1.5 \cdot 10^{- 6}$	$≃ 1 \cdot 10^{- 5}$	$≃ 1.6$
BLN 30-03-18	$≃ 1 \cdot 10^{- 9}$	$≃ 1.3 \cdot 10^{- 7}$	$≃ 7.8 \cdot 10^{- 7}$	$≃ 2.2$
BLN 13-07-18	$≃ 7.5 \cdot 10^{- 10}$	$≃ 4.2 \cdot 10^{- 4}$	$≃ 5.3 \cdot 10^{- 5}$	$≃ 200$
BLN 19-12-18	$≃ 7.5 \cdot 10^{- 10}$	$≃ 1.7 \cdot 10^{- 8}$	$≃ 1.1 \cdot 10^{- 9}$	$≃ 7$
BLN 30-01-19	$≃ 6.2 \cdot 10^{- 10}$	$≃ 1.8 \cdot 10^{- 5}$	$≃ 4.1 \cdot 10^{- 6}$	$≃ 614$
BLN 01-03-19	$≃ 5.7 \cdot 10^{- 10}$	$≃ 5.4 \cdot 10^{- 6}$	$≃ 9.1 \cdot 10^{- 6}$	$≃ 961$
BLN 17-07-19	$≃ 4.9 \cdot 10^{- 10}$	$≃ 1.3 \cdot 10^{- 3}$	$≃ 3.5 \cdot 10^{- 3}$	$≃ 3350$

Open in a new tab

As the results reveal, it is more accurate that the single methods composing it and much faster than the quasi-Newton one - for some snapshots, more accurate and even faster than Newton’s method. Only the results corresponding to the best choice of initial conditions are reported.

We would like to end the discussion about the results presented in this contribution by explicitly mentioning a circumstance that is frequently met when studying economic and financial networks. When considering systems like these, the information about the number of neighbours of each node is typically not accessible: as a consequence, the models constraining both binary and weighted information cannot be employed as they have presented in this contribution.

Alternatives exist and rest upon the existence of some kind of relationship between binary and weighted constraints. In the case of undirected networks, such a relationship is usually written as

\begin{matrix} e^{- θ_{i}} = \sqrt{z} s_{i} \end{matrix}

and establishes that the Lagrange multipliers controlling for the degrees are linearly proportional to the strengths. If this is the case (or a valid reason exists for this to be the case), the expression for the probability that any two nodes are connected becomes

\begin{matrix} p_{ij}^{dcGM} = \frac{z s_{i} s_{j}}{1 + z s_{i} s_{j}} \forall i < j \end{matrix}

the acronym standing for degree-corrected Gravity Model⁴⁴. The (only) unknown parameter z must be numerically estimated by employing some kind of topological information; this is usually represented by (a proxy of) the network link density, used to instantiate the (only) likelihood condition

\begin{matrix} L (A^{*}) = ⟨ L ⟩ = \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ (j < i) \end{matrix}}^{N} \frac{z s_{i} s_{j}}{1 + z s_{i} s_{j}} ; \end{matrix}

once the equation above has been solved, the set of coefficients ${p_{ij}^{dcGM}}_{i, j = 1}^{N}$ can be either employed 1) to, first, estimate the degrees and, then, solve the UECM⁴⁵ or 2) within the CReM framework, via the identification $f_{ij} \equiv p_{ij}^{dcGM}$ , to estimate the parameters controlling for the weighted constraints.

Conclusions

The definition and correct implementation of null models is a crucial issue in network analysis: the present contribution focuses on (a subset of) the ones constituting the so-called ERG framework - a choice driven by the evidence that they are the most commonly employed ones for tasks as different as network reconstruction, pattern detection, graph enumeration. The optimization of the likelihood function associated to them is, however, still problematic since it involves the resolution of large systems of coupled, non-linear equations.

Here, we have implemented and compared three algorithms for numerical optimization, with the aim of finding the one performing best (i.e. being both accurate and fast) for each model. What emerges from our results is that there is no a unique method which is both accurate and fast for all models on all configurations: under this respect, performance is just a trade-off between accuracy and speed. However, some general conclusions can still be drawn.

Newton’s method is the one requiring the largest amount of information per step (in fact, the entire Hessian matrix has to be evaluated at each iteration): hence, it is the most accurate one but, for the same reason, often the one characterized by the slowest performance. A major drawback of Newton’s method is that of scaling very badly with the network size.

At the opposite extreme lies the fixed-point algorithm, theoretically the fastest one but, often, among the least accurate ones (at least, for what concerns the weighted constraints); the performance if the quasi-Newton method often lies in-between the performances of the two methods above, by achieving an accuracy that is larger than the one achieved by the fixed-point algorithm, requiring less time that the one needed by Newton’s method.

Overall, while Newton’s method seems to perform best on either relatively small or greatly reducible networks, the fixed-point method must be preferred for large, non-reducible configurations. Deviations from this (over simplified) picture are, however, clearly visible.

Future work concerns the application of the aforementioned three numerical recipes to the models that have not found place here. For what concerns the set of purely binary constraints, the ones defining the Reciprocal Binary Configuration Model (RBCM)¹⁵ deserve to be mentioned.

For what concerns the ‘mixed’ constraints, instead, the CReM framework is versatile enough to accommodate a quite large number of variants. In the present work, we have ‘limited’ ourselves to combine the UBCM and the DBCM with (conditional) distributions of continuous weights: a first, obvious, generalization is that of considering the discrete versions of such models, defined by the positions

\begin{matrix} {⟨ w_{ij} ⟩}_{d-CReM}^{und} = \frac{f_{ij}}{1 - e^{- β_{j} - β_{i}}} \end{matrix}

with $f_{ij} \equiv p_{ij}^{UBCM}$ and

\begin{matrix} {⟨ w_{ij} ⟩}_{d-CReM}^{dir} = \frac{f_{ij}}{1 - e^{- γ_{j} - δ_{i}}} \end{matrix}

with $f_{ij} \equiv p_{ij}^{DBCM}$ ; a second one concerns the continuous versions of the UECM and of the DECM, respectively defined by the positions¹⁹

\begin{matrix} p_{ij}^{UECM} = \frac{e^{- α_{i} - α_{j}}}{β_{i} + β_{j} + e^{- α_{i} - α_{j}}} \end{matrix}

and

\begin{matrix} p_{ij}^{DECM} = \frac{e^{- α_{i} - β_{j}}}{γ_{i} + δ_{j} + e^{- α_{i} - β_{j}}} . \end{matrix}

Supplementary Information

Supplementary Information.^{(259.2KB, pdf)}

Acknowledgements

FS and TS acknowledge support from the European project SoBigData++ (GA. 871042). GC and MZ acknowledge support from the PAI project ORCS (‘Optimized Reconstruction of Complex networkS’), funded by the IMT School for Advanced Studies Lucca. FS also acknowledges support from the Italian ‘Programma di Attività Integrata’ (PAI) project ‘TOol for Fighting FakEs’ (TOFFE), funded by IMT School for Advanced Studies Lucca.

Appendix A: computing the Hessian matrix

As we showed in the main text, the Hessian matrix of our likelihood function is ‘minus’ the covariance matrix of the constraints, i.e.

\begin{matrix} H_{ij} = \frac{\partial^{2} L (\vec{θ})}{\partial θ_{i} \partial θ_{j}} = - Cov [C_{i}, C_{j}], i, j = 1 \dots M ; \end{matrix}

interestingly, a variety of alternative methods exists to explicitly calculate the generic entry $H_{ij}$ , i.e. 1) taking the second derivatives of the likelihood function characterizing the method under analysis, 2) taking the first derivatives of the expectation values of the constraints characterizing the method under analysis, 3) calculating the moments of the pair-specific probability distributions characterizing each method.

UBCM: binary undirected graphs with given degree sequence

The Hessian matrix for the UBCM is an $N \times N$ symmetric table with entries reading

\begin{matrix} H_{UBCM} = \{\begin{matrix} Var [k_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij} (1 - p_{ij}), \forall i \\ Cov [k_{i}, k_{j}] = p_{ij} (1 - p_{ij}), \forall i \neq j \end{matrix}) \end{matrix}

where $p_{ij} \equiv p_{ij}^{UBCM}$ . Notice that $Var [k_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} Cov [k_{i}, k_{j}], \forall i$ .

DBCM: binary directed graphs with given in-degree and out-degree sequences

The Hessian matrix for the DBCM is a $2 N \times 2 N$ symmetric table that can be further subdivided into four $N \times N$ blocks whose entries read

\begin{matrix} H_{DBCM} = \{\begin{matrix} Var [k_{i}^{out}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij} (1 - p_{ij}), \forall i \\ Var [k_{i}^{in}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ji} (1 - p_{ji}), \forall i \\ Cov [k_{i}^{out}, k_{j}^{in}] = p_{ij} (1 - p_{ij}), \forall i \neq j \\ Cov [k_{j}^{out}, k_{i}^{in}] = p_{ji} (1 - p_{ji}), \forall i \neq j \end{matrix}) \end{matrix}

while $Cov [k_{i}^{out}, k_{i}^{in}] = Cov [k_{i}^{out}, k_{j}^{out}] = Cov [k_{i}^{in}, k_{j}^{in}] = 0$ and $p_{ij} \equiv p_{ij}^{DBCM}$ .

Notice that the Hessian matrix of the BiCM mimicks the DBCM one, the only difference being that the probability coefficients are now indexed by i and $α$ : for example, in the BiCM case, one has that $Cov [k_{i}, d_{α}] = p_{i α} (1 - p_{i α})$ , $\forall i, α$ .

UECM: weighted undirected graphs with given strengths and degrees

The Hessian matrix for the UECM is a $2 N \times 2 N$ symmetric table that can be further subdivided into four blocks (each of which with dimensions $N \times N$ ). In order to save space, the expressions indexed by the single subscript i will be assumed as being valid $\forall i$ , while the ones indexed by a double subscript i, j will be assumed as being valid $\forall i \neq j$ . The entries of the diagonal blocks read

\begin{matrix} H_{UECM} = \{\begin{matrix} \frac{\partial^{2} L_{UECM}}{\partial α_{i}^{2}} = Var [k_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij} (1 - p_{ij}) \\ \frac{\partial^{2} L_{UECM}}{\partial α_{i} α_{j}} = Cov [k_{i}, k_{j}] = p_{ij} (1 - p_{ij}) \end{matrix}) \end{matrix}

and

\begin{matrix} H_{UECM} = \{\begin{matrix} \frac{\partial^{2} L_{UECM}}{\partial β_{i}^{2}} = Var [s_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ij} (1 - p_{ij} + e^{- β_{i} - β_{j}})}{{(1 - e^{- β_{i} - β_{j}})}^{2}} \\ \frac{\partial^{2} L_{UECM}}{\partial β_{i} β_{j}} = Cov [s_{i}, s_{j}] = \frac{p_{ij} (1 - p_{ij} + e^{- β_{i} - β_{j}})}{{(1 - e^{- β_{i} - β_{j}})}^{2}} \end{matrix}) \end{matrix}

where $p_{ij} \equiv p_{ij}^{UECM}$ . On the other hand, the entries of the off-diagonal blocks read

\begin{matrix} H_{UECM} = \{\begin{matrix} \frac{\partial^{2} L_{UECM}}{\partial α_{i} \partial β_{i}} = Cov [k_{i}, s_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ij} (1 - p_{ij})}{1 - e^{- β_{i} - β_{j}}} \\ \frac{\partial^{2} L_{UECM}}{\partial α_{i} \partial β_{j}} = Cov [k_{i}, s_{j}] = \frac{p_{ij} (1 - p_{ij})}{1 - e^{- β_{i} - β_{j}}} \end{matrix}) \end{matrix}

with $p_{ij} \equiv p_{ij}^{UECM}$ .

DECM: weighted directed graphs with given strengths and degrees

The Hessian matrix for the DECM is a $4 N \times 4 N$ symmetric table that can be further subdivided into four blocks (each of which with dimensions $N \times N$ ). As for the UECM, in order to save space, the expressions indexed by the single subscript i will be assumed as being valid $\forall i$ , while the ones indexed by a double subscript i, j will be assumed as being valid $\forall i \neq j$ . The entries of the diagonal blocks read

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial α_{i}^{2}} = Var [k_{i}^{out}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ij} (1 - p_{ij}) \\ \frac{\partial^{2} L_{DECM}}{\partial α_{i} α_{j}} = Cov [k_{i}^{out}, k_{j}^{out}] = 0 \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial β_{i}^{2}} = Var [k_{i}^{in}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} p_{ji} (1 - p_{ji}) \\ \frac{\partial^{2} L_{DECM}}{\partial β_{i} β_{j}} = Cov [k_{i}^{in}, k_{j}^{in}] = 0 \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial γ_{i}^{2}} = Var [s_{i}^{out}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ij} (1 - p_{ij} + e^{- γ_{i} - δ_{j}})}{{(1 - e^{- γ_{i} - δ_{j}})}^{2}} \\ \frac{\partial^{2} L_{DECM}}{\partial γ_{i} γ_{j}} = Cov [s_{i}^{out}, s_{j}^{out}] = 0 \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial δ_{i}^{2}} = Var [s_{i}^{in}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ji} (1 - p_{ji} + e^{- γ_{j} - δ_{i}})}{{(1 - e^{- γ_{j} - δ_{i}})}^{2}} \\ \frac{\partial^{2} L_{DECM}}{\partial δ_{i} δ_{j}} = Cov [s_{i}^{in}, s_{j}^{in}] = 0 \end{matrix}) \end{matrix}

where $p_{ij} \equiv p_{ij}^{DECM}$ . On the other hand, the entries of the off-diagonal blocks read

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial β_{i}} = Cov [k_{i}^{out}, k_{i}^{in}] = 0 \\ \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial β_{j}} = Cov [k_{i}^{out}, k_{j}^{in}] = p_{ij} (1 - p_{ij}) \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial γ_{i}} = Cov [k_{i}^{out}, s_{i}^{out}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ij} (1 - p_{ij})}{1 - e^{- γ_{i} - δ_{j}}} \\ \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial γ_{j}} = Cov [k_{i}^{out}, s_{j}^{out}] = 0 \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial δ_{i}} = Cov [k_{i}^{out}, s_{i}^{in}] = 0 \\ \frac{\partial^{2} L_{DECM}}{\partial α_{i} \partial δ_{j}} = Cov [k_{i}^{out}, s_{j}^{in}] = \frac{p_{ij} (1 - p_{ij})}{1 - e^{- γ_{i} - δ_{j}}} \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial β_{i} \partial γ_{i}} = Cov [k_{i}^{in}, s_{i}^{out}] = 0 \\ \frac{\partial^{2} L_{DECM}}{\partial β_{i} \partial γ_{j}} = Cov [k_{i}^{in}, s_{j}^{out}] = \frac{p_{ji} (1 - p_{ji})}{1 - e^{- γ_{j} - δ_{i}}} \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial β_{i} \partial δ_{i}} = Cov [k_{i}^{in}, s_{i}^{in}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{p_{ji} (1 - p_{ji})}{1 - e^{- γ_{j} - δ_{i}}} \\ \frac{\partial^{2} L_{DECM}}{\partial β_{i} \partial δ_{j}} = Cov [k_{i}^{in}, s_{j}^{in}] = 0 \end{matrix}) \end{matrix}

and

\begin{matrix} H_{DECM} = \{\begin{matrix} \frac{\partial^{2} L_{DECM}}{\partial γ_{i} \partial δ_{i}} = Cov [s_{i}^{out}, s_{i}^{in}] = 0 \\ \frac{\partial^{2} L_{DECM}}{\partial γ_{i} \partial δ_{j}} = Cov [s_{i}^{out}, s_{j}^{in}] = \frac{p_{ij} (1 - p_{ij} + e^{- γ_{i} - δ_{j}})}{{(1 - e^{- γ_{i} - δ_{j}})}^{2}} \end{matrix}) \end{matrix}

100

with $p_{ij} \equiv p_{ij}^{DECM}$ .

Two-step models for undirected and directed networks

The Hessian matrix for the undirected two-step model considered here is an $N \times N$ symmetric table reading

\begin{matrix} H_{CReM}^{und} = \{\begin{matrix} Var [s_{i}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{f_{ij}}{{(θ_{i} + θ_{j})}^{2}}, \forall i \\ Cov [s_{i}, s_{j}] = \frac{f_{ij}}{{(θ_{i} + θ_{j})}^{2}}, \forall i \neq j \end{matrix}) \end{matrix}

101

where $f_{ij}$ is given. In the directed case, instead, the Hessian matrix for the two-step model considered here is a $2 N \times 2 N$ symmetric table that can be further subdivided into four $N \times N$ blocks whose entries read

\begin{matrix} H_{CReM}^{dir} \{\begin{matrix} Var [s_{i}^{out}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{f_{ij}}{{(α_{i} + β_{j})}^{2}}, \forall i \\ Var [s_{i}^{in}] = \sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{f_{ji}}{{(α_{j} + β_{i})}^{2}}, \forall i \\ Cov [s_{i}^{out}, s_{j}^{in}] = \frac{f_{ij}}{{(α_{i} + β_{j})}^{2}}, \forall i \neq j \end{matrix}) \end{matrix}

102

while $Cov [s_{i}^{out}, s_{i}^{in}] = Cov [s_{i}^{out}, s_{j}^{out}] = Cov [s_{i}^{in}, s_{j}^{in}] = 0$ and $f_{ij}$ is given.

Appendix B: a note on the change of variables

In all methods we will considered in the present work, the variable $θ_{i}$ appears in the optimality conditions only through negative exponential functions: it is therefore tempting to perform the change of variable $x_{i} \equiv e^{- θ_{i}}$ . Although this is often performed in the literature, one cannot guarantee that the new optimization problem remains convex: in fact, simple examples can be provided for which convexity is lost. This has several consequences, e.g. (1) convergence to the global maximum is no longer guaranteed (since the existence of a global maximum is no longer guaranteed as well), (2) extra-care is needed to guarantee that the Hessian matrix $H$ employed in our algorithms is negative definite. While problem (2) introduces additional complexity only for Newton’s method, problem (1) is more serious from a theoretical point of view.

Let us now address problem (1) in more detail. First, it is possible to prove that any stationary point for $L (\vec{x})$ satisfies the optimality conditions for $L (\vec{θ})$ as well. In fact, the application of the ‘chain rule’ leads to recover the set of relationships

\begin{matrix} \frac{\partial L (\vec{θ})}{\partial θ_{i}} = \frac{\partial x_{i}}{\partial θ_{i}} \frac{\partial L (\vec{x})}{\partial x_{i}} = - x_{i} \frac{\partial L (\vec{x})}{\partial x_{i}}, i = 1 \dots M ; \end{matrix}

103

notice that requiring $\nabla_{θ_{i}} L (\vec{θ}) = 0$ leads to require that either $\nabla_{x_{i}} L (\vec{x}) = 0$ or $x_{i} = 0$ . As the second eventuality precisely identifies isolated nodes (i.e. the nodes for which the constraint $C_{i} (G^{*})$ , controlled by the multiplier $θ_{i}$ , is 0), one can get rid of it by explicitly removing the corresponding addenda from the likelihood function.

For what concerns convexity, let us explicitly calculate the Hessian matrix for the set of variables ${x_{i}}_{i = 1}^{M}$ . In formulas,

\begin{matrix} \frac{\partial^{2} L (\vec{x})}{\partial x_{i}^{2}} = & e^{2 θ_{i}} (\frac{\partial^{2} L (\vec{θ})}{\partial θ_{i}^{2}} + \frac{\partial L (\vec{θ})}{\partial θ_{i}}), i = 1 \dots M, \\ \frac{\partial^{2} L (\vec{x})}{\partial x_{i} \partial x_{j}} = & e^{θ_{i} + θ_{j}} (\frac{\partial^{2} L (\vec{θ})}{\partial θ_{i} \partial θ_{j}}), \forall i \neq j \end{matrix}

104

according to the ‘chain rule’ for second-order derivatives. More compactly,

\begin{matrix} H_{L (\vec{x})} = e^{Θ} \circ (- Cov [C_{i}, C_{j}] + I \cdot \nabla_{\vec{θ}} L (\vec{θ})) \end{matrix}

105

where $I$ is the identity matrix, the generic entry of the matrix $e^{Θ}$ reads ${[e^{Θ}]}_{ij} \equiv e^{θ_{i} + θ_{j}}$ , $\forall i, j$ and the symbol ‘ $\circ$ ’ indicates the Hadamard (i.e. element-wise) product of matrices. In general, the expression above defines an indefinite matrix, i.e. a neither positive nor negative (semi)definite one.

Appendix C: fixed point method in the multivariate case

Equation (21) can be written as

\begin{matrix} θ_{i}^{(n)} = G_{i} ({\vec{θ}}^{(n - 1)}), i = 1 \dots N ; \end{matrix}

106

for the sake of illustration, let us discuss it for the UBCM case. In this particular case, the set of equations above can be rewritten as

\begin{matrix} θ_{i}^{(n)} = - ln [\frac{k_{i} (A^{*})}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{j}^{(n - 1)}}}{1 + e^{- θ_{i}^{(n - 1)} - θ_{j}^{(n - 1)}}})}], i = 1 \dots N . \end{matrix}

107

Since all components of the map $G$ are continuous on $R^{N}$ , the map itself is continuous on $R^{N}$ . Hence, a fixed point exists. Let us now consider its Jacobian matrix and check the magnitude of its elements. In the UBCM case, one finds that

\begin{matrix} \frac{\partial G_{i}}{\partial θ_{i}} = & \frac{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} \frac{e^{- θ_{i} - 2 θ_{j}}}{{(1 + e^{- θ_{i} - θ_{j}})}^{2}}}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})} = \frac{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} {(\frac{e^{- θ_{i} - θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})}^{2}}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{i} - θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})} \\ = & 1 - \frac{Var [k_{i}]}{⟨ k_{i} ⟩} = 1 - \frac{\sum_{j \neq i} Cov [k_{i}, k_{j}]}{⟨ k_{i} ⟩}, \forall i \end{matrix}

108

and

\begin{matrix} \frac{\partial G_{i}}{\partial θ_{j}} = & - \frac{\frac{e^{- θ_{j}}}{{(1 + e^{- θ_{i} - θ_{j}})}^{2}}}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})} = - \frac{\frac{e^{- θ_{i} - θ_{j}}}{{(1 + e^{- θ_{i} - θ_{j}})}^{2}}}{\sum_{\begin{matrix} j = 1 \\ (j \neq i) \end{matrix}}^{N} (\frac{e^{- θ_{i} - θ_{j}}}{1 + e^{- θ_{i} - θ_{j}}})} \\ = & - \frac{Cov [k_{i}, k_{j}]}{⟨ k_{i} ⟩}, \forall i, j . \end{matrix}

109

Let us notice that (1) each element of the Jacobian matrix is a continuous function $R^{N} \to R$ and that (2) the following relationships hold

\begin{matrix} |\frac{\partial G_{i}}{\partial θ_{j}}| \leq 1, \forall i, j ; \end{matrix}

110

unfortunately, however, when multivariate functions are considered, the set of conditions above is not enough to ensure convergence to the fixed point for any choice of the initial value of the parameters. What is needed to be checked is the condition $| | J_{G} (\vec{θ}) | | < 1$ , with J indicating the Jacobian of the map (i.e. the matrix of the first, partial derivatives above) and ||.|| any natural matrix norm: the validity of such a condition has been numerically verified case by case.

Author contributions

F.S., T.S., G.C., M.Z. developed the methods and designed the research. N.V., E.M., M.B., G.T. performed the analysis (N.V.: DBCM, DECM, RBCM; E.M.: UBCM, UECM, CReM; M.B.: BiCM; G.T.: preliminary version of the BiCM). F.S., T.S., G.C., M.Z. wrote the manuscript. All authors reviewed and approved the manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-93830-4.

References

1.Newman MEJ. Networks: An Introduction. Oxford University Press; 2010. [Google Scholar]
2.Colizza V, Barrat A, Barthelemy M, Vespignani A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl. Acad. Sci. 2006;103(7):2015–2020. doi: 10.1073/pnas.0510525103. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Barrat A, Barthlemy M, Vespignani A. Dynamical Processes on Complex Networks. Cambridge University Press; 2008. [Google Scholar]
4.Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015;87:925. doi: 10.1103/RevModPhys.87.925. [DOI] [Google Scholar]
5.Castellano C, Fortunato S, Loreto V. Statistical physics of social dynamics. Rev. Mod. Phys. 2009;81:591. doi: 10.1103/RevModPhys.81.591. [DOI] [Google Scholar]
6.Squartini T, van Lelyveld I, Garlaschelli D. Early-warning signals of topological collapse in interbank networks. Sci. Rep. 2013;3:3357. doi: 10.1038/srep03357. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Cimini G, Squartini T, Saracco F, Garlaschelli D, Gabrielli A, Caldarelli G. The statistical physics of real-world networks. Nat. Rev. Phys. 2019;1(1):58–71. doi: 10.1038/s42254-018-0002-6. [DOI] [Google Scholar]
8.Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002;296(5569):910–913. doi: 10.1126/science.1065103. [DOI] [PubMed] [Google Scholar]
9.Coolen ACC, De Martino A, Annibale A. Constrained Markovian dynamics of random graphs. J. Stat. Phys. 2009;136:1035–1067. doi: 10.1007/s10955-009-9821-2. [DOI] [Google Scholar]
10.Roberts ES, Coolen ACC. Unbiased degree-preserving randomization of directed binary networks. Phys. Rev. E. 2012;85(4):046103. doi: 10.1103/PhysRevE.85.046103. [DOI] [PubMed] [Google Scholar]
11.Artzy-Randrup Y, Stone L. Generating uniformly distributed random networks. Phys. Rev. E. 2005;72(5):056708. doi: 10.1103/PhysRevE.72.056708. [DOI] [PubMed] [Google Scholar]
12.Del Genio CI, Kim H, Toroczkai Z, Bassler KE, Del Genio CI, Kim H, Toroczkai Z, Bassler KE. PLoS One. 2010;5(4):e10012. doi: 10.1371/journal.pone.0010012. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kim H, Del Genio CI, Bassler KE, Toroczkai Z. Constructing and sampling directed graphs with given degree sequences. New J. Phys. 2012;14:023012. doi: 10.1088/1367-2630/14/2/023012. [DOI] [Google Scholar]
14.Blitzstein J, Diaconis P. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math. 2011;6(4):489–522. doi: 10.1080/15427951.2010.557277. [DOI] [Google Scholar]
15.Squartini T, Garlaschelli D. Analytical maximum-likelihood method to detect patterns in real networks. New. J. Phys. 2011;13:083001. doi: 10.1088/1367-2630/13/8/083001. [DOI] [Google Scholar]
16.Park J, Newman MEJ. Statistical mechanics of networks. Phys. Rev. E. 2004;70(6):066117. doi: 10.1103/PhysRevE.70.066117. [DOI] [PubMed] [Google Scholar]
17.Bianconi G. The entropy of randomized network ensembles. Europhys. Lett. 2007;81(2):28005. doi: 10.1209/0295-5075/81/28005. [DOI] [Google Scholar]
18.Fronczak A, Fronczak P, Holyst JA. Fluctuation-dissipation relations in complex networks. Phys. Rev. E. 2006;73(1):016108. doi: 10.1103/PhysRevE.73.016108. [DOI] [PubMed] [Google Scholar]
19.Gabrielli A, Mastrandrea R, Caldarelli G, Cimini G. Grand canonical ensemble of weighted networks. Phys. Rev. E. 2019;99(3):030301(R). doi: 10.1103/PhysRevE.99.030301. [DOI] [PubMed] [Google Scholar]
20.Fronczak Agata. Exponential random graph models. In: Reda Alhajj, Jon Rokne., editors. Encyclopedia of Social Network Analysis and Mining. 2014. New York: Springer-Verlag; 2014. [Google Scholar]
21.Jaynes ET. Information theory and statistical mechanics. Phys. Rev. 1957;106(4):620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]
22.Dianati, N. A maximum entropy approach to separating noise from signal in bimodal affiliation networks, arXiv:1607.01735 (2016).
23.Vallarano Nicolò, Tessone Claudio J., Squartini Tiziano. Bitcoin Transaction Networks: an overview of recent results. Frontiers in Physics. 2020;8:286. doi: 10.3389/fphy.2020.00286. [DOI] [Google Scholar]
24.Garlaschelli D, Loffredo MI. Maximum likelihood: extracting unbiased information from complex networks. Phys. Rev. E. 2008;78(1):015101(R). doi: 10.1103/PhysRevE.78.015101. [DOI] [PubMed] [Google Scholar]
25.Nocedal J, Wright SJ. Numerical Optimization. Springer; 2006. [Google Scholar]
26.Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; 2004. [Google Scholar]
27.Chung F, Lu L. Connected components in random graphs with given expected degree sequences. Ann. Combinatorics. 2002;6:125–145. doi: 10.1007/PL00012580. [DOI] [Google Scholar]
28.Oshio, K., Iwasaki, Y., Morita, S., Osana, Y., Gomi, S., Akiyama, E., Omata, K., Oka, K. & Kawamura, K. Database of Synaptic Connectivity of C. elegans, Technical Report of CCeP, Keio Future3, (Keio University, 2003).
29.Colizza V, Pastor-Satorras R, Vespignani A. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 2007;3:276–282. doi: 10.1038/nphys560. [DOI] [Google Scholar]
30.Database of Interacting Proteins and can be found at the following URL: http://dip.doe-mbi.ucla.edu/dip/Main.cgi
31.Colizza V, Flammini A, Serrano MA, Vespignani A. Detecting rich-club ordering in complex networks. Nat. Phys. 2006;2:110–115. doi: 10.1038/nphys209. [DOI] [Google Scholar]
32.Lin J-H, Primicerio K, Squartini T, Decker C, Tessone CJ. Lightning Network: a second path towards centralisation of the Bitcoin economy. New J. Phys. 2020;22:083022. doi: 10.1088/1367-2630/aba062. [DOI] [Google Scholar]
33.Miller, J. C. & Hagberg, A. Efficient generation of networks with given expected degrees, LNCS 6732. (eds Frieze, A., Horn, P. & Pralat P.) 115–126 (Springer, 2011).
34.Squartini T, Fagiolo G, Garlaschelli D. Randomizing world trade. I. A binary network analysis. Phys. Rev. E. 2011;84:046117. doi: 10.1103/PhysRevE.84.046117. [DOI] [PubMed] [Google Scholar]
35.Bovet, A., Campajola, C., Mottes, F., Restocchi, V., Vallarano, N., Squartini, T. & Tessone, C. J.The evolving liaisons between the transaction networks of Bitcoin and its price dynamics, arXiv:1907.03577 (2019).
36.Caldarelli Guido, De Nicola Rocco, Petrocchi Marinella, Pratelli Manuel, Saracco Fabio. Flow of online misinformation during the peak of the COVID-19 pandemic in Italy. EPJ Data Science. 2020 doi: 10.1140/epjds/s13688-021-00289-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Saracco F, Di Clemente R, Gabrielli A, Squartini T. Randomizing bipartite networks: the case of the World Trade Web. Sci. Rep. 2015;5:10595. doi: 10.1038/srep10595. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Mastrandrea R, Squartini T, Fagiolo G, Garlaschelli D. Intensive and extensive biases in economic networks: reconstructing world trade. New J. Phys. 2014;16:043022. doi: 10.1088/1367-2630/16/4/043022. [DOI] [PubMed] [Google Scholar]
39.Garlaschelli D, Loffredo MI. Generalized bose-fermi statistics and structural correlations in weighted networks. Phys. Rev. Lett. 2009;102(3):038701. doi: 10.1103/PhysRevLett.102.038701. [DOI] [PubMed] [Google Scholar]
40.Mastrandrea R, Squartini T, Fagiolo G, Garlaschelli D. Reconstructing the world trade multiplex: the role of intensive and extensive biases. Phys. Rev. E. 2014;90(6):062804. doi: 10.1103/PhysRevE.90.062804. [DOI] [PubMed] [Google Scholar]
41.Gleditsch K. Expanded trade and GDP data. J. Conflict Resol. 2002;46:712–24. doi: 10.1177/0022002702046005006. [DOI] [Google Scholar]
42.Iori G, De Masi G, Precup OV, Gabbi G, Caldarelli G. A network analysis of the Italian overnight money market. J. Econ. Dyn. Control. 2006;32(1):259–278. doi: 10.1016/j.jedc.2007.01.032. [DOI] [Google Scholar]
43.Parisi F, Squartini T, Garlaschelli D. A faster horse on a safer trail: generalized inference for the efficient reconstruction of weighted networks. New J. Phys. 2020;22:053053. doi: 10.1088/1367-2630/ab74a7. [DOI] [Google Scholar]
44.Cimini G, Squartini T, Gabrielli A, Garlaschelli D. Estimating topological properties of weighted networks from limited information. Phys. Rev. E. 2015;92:040802. doi: 10.1103/PhysRevE.92.040802. [DOI] [PubMed] [Google Scholar]
45.Cimini G, Squartini T, Gabrielli A, Garlaschelli D. Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 2015;5:15758. doi: 10.1038/srep15758. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(259.2KB, pdf)}

[CR1] 1.Newman MEJ. Networks: An Introduction. Oxford University Press; 2010. [Google Scholar]

[CR2] 2.Colizza V, Barrat A, Barthelemy M, Vespignani A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl. Acad. Sci. 2006;103(7):2015–2020. doi: 10.1073/pnas.0510525103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Barrat A, Barthlemy M, Vespignani A. Dynamical Processes on Complex Networks. Cambridge University Press; 2008. [Google Scholar]

[CR4] 4.Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A. Epidemic processes in complex networks. Rev. Mod. Phys. 2015;87:925. doi: 10.1103/RevModPhys.87.925. [DOI] [Google Scholar]

[CR5] 5.Castellano C, Fortunato S, Loreto V. Statistical physics of social dynamics. Rev. Mod. Phys. 2009;81:591. doi: 10.1103/RevModPhys.81.591. [DOI] [Google Scholar]

[CR6] 6.Squartini T, van Lelyveld I, Garlaschelli D. Early-warning signals of topological collapse in interbank networks. Sci. Rep. 2013;3:3357. doi: 10.1038/srep03357. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Cimini G, Squartini T, Saracco F, Garlaschelli D, Gabrielli A, Caldarelli G. The statistical physics of real-world networks. Nat. Rev. Phys. 2019;1(1):58–71. doi: 10.1038/s42254-018-0002-6. [DOI] [Google Scholar]

[CR8] 8.Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002;296(5569):910–913. doi: 10.1126/science.1065103. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Coolen ACC, De Martino A, Annibale A. Constrained Markovian dynamics of random graphs. J. Stat. Phys. 2009;136:1035–1067. doi: 10.1007/s10955-009-9821-2. [DOI] [Google Scholar]

[CR10] 10.Roberts ES, Coolen ACC. Unbiased degree-preserving randomization of directed binary networks. Phys. Rev. E. 2012;85(4):046103. doi: 10.1103/PhysRevE.85.046103. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Artzy-Randrup Y, Stone L. Generating uniformly distributed random networks. Phys. Rev. E. 2005;72(5):056708. doi: 10.1103/PhysRevE.72.056708. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Del Genio CI, Kim H, Toroczkai Z, Bassler KE, Del Genio CI, Kim H, Toroczkai Z, Bassler KE. PLoS One. 2010;5(4):e10012. doi: 10.1371/journal.pone.0010012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Kim H, Del Genio CI, Bassler KE, Toroczkai Z. Constructing and sampling directed graphs with given degree sequences. New J. Phys. 2012;14:023012. doi: 10.1088/1367-2630/14/2/023012. [DOI] [Google Scholar]

[CR14] 14.Blitzstein J, Diaconis P. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math. 2011;6(4):489–522. doi: 10.1080/15427951.2010.557277. [DOI] [Google Scholar]

[CR15] 15.Squartini T, Garlaschelli D. Analytical maximum-likelihood method to detect patterns in real networks. New. J. Phys. 2011;13:083001. doi: 10.1088/1367-2630/13/8/083001. [DOI] [Google Scholar]

[CR16] 16.Park J, Newman MEJ. Statistical mechanics of networks. Phys. Rev. E. 2004;70(6):066117. doi: 10.1103/PhysRevE.70.066117. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Bianconi G. The entropy of randomized network ensembles. Europhys. Lett. 2007;81(2):28005. doi: 10.1209/0295-5075/81/28005. [DOI] [Google Scholar]

[CR18] 18.Fronczak A, Fronczak P, Holyst JA. Fluctuation-dissipation relations in complex networks. Phys. Rev. E. 2006;73(1):016108. doi: 10.1103/PhysRevE.73.016108. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Gabrielli A, Mastrandrea R, Caldarelli G, Cimini G. Grand canonical ensemble of weighted networks. Phys. Rev. E. 2019;99(3):030301(R). doi: 10.1103/PhysRevE.99.030301. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Fronczak Agata. Exponential random graph models. In: Reda Alhajj, Jon Rokne., editors. Encyclopedia of Social Network Analysis and Mining. 2014. New York: Springer-Verlag; 2014. [Google Scholar]

[CR21] 21.Jaynes ET. Information theory and statistical mechanics. Phys. Rev. 1957;106(4):620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]

[CR22] 22.Dianati, N. A maximum entropy approach to separating noise from signal in bimodal affiliation networks, arXiv:1607.01735 (2016).

[CR23] 23.Vallarano Nicolò, Tessone Claudio J., Squartini Tiziano. Bitcoin Transaction Networks: an overview of recent results. Frontiers in Physics. 2020;8:286. doi: 10.3389/fphy.2020.00286. [DOI] [Google Scholar]

[CR24] 24.Garlaschelli D, Loffredo MI. Maximum likelihood: extracting unbiased information from complex networks. Phys. Rev. E. 2008;78(1):015101(R). doi: 10.1103/PhysRevE.78.015101. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Nocedal J, Wright SJ. Numerical Optimization. Springer; 2006. [Google Scholar]

[CR26] 26.Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; 2004. [Google Scholar]

[CR27] 27.Chung F, Lu L. Connected components in random graphs with given expected degree sequences. Ann. Combinatorics. 2002;6:125–145. doi: 10.1007/PL00012580. [DOI] [Google Scholar]

[CR28] 28.Oshio, K., Iwasaki, Y., Morita, S., Osana, Y., Gomi, S., Akiyama, E., Omata, K., Oka, K. & Kawamura, K. Database of Synaptic Connectivity of C. elegans, Technical Report of CCeP, Keio Future3, (Keio University, 2003).

[CR29] 29.Colizza V, Pastor-Satorras R, Vespignani A. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat. Phys. 2007;3:276–282. doi: 10.1038/nphys560. [DOI] [Google Scholar]

[CR30] 30.Database of Interacting Proteins and can be found at the following URL: http://dip.doe-mbi.ucla.edu/dip/Main.cgi

[CR31] 31.Colizza V, Flammini A, Serrano MA, Vespignani A. Detecting rich-club ordering in complex networks. Nat. Phys. 2006;2:110–115. doi: 10.1038/nphys209. [DOI] [Google Scholar]

[CR32] 32.Lin J-H, Primicerio K, Squartini T, Decker C, Tessone CJ. Lightning Network: a second path towards centralisation of the Bitcoin economy. New J. Phys. 2020;22:083022. doi: 10.1088/1367-2630/aba062. [DOI] [Google Scholar]

[CR33] 33.Miller, J. C. & Hagberg, A. Efficient generation of networks with given expected degrees, LNCS 6732. (eds Frieze, A., Horn, P. & Pralat P.) 115–126 (Springer, 2011).

[CR34] 34.Squartini T, Fagiolo G, Garlaschelli D. Randomizing world trade. I. A binary network analysis. Phys. Rev. E. 2011;84:046117. doi: 10.1103/PhysRevE.84.046117. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Bovet, A., Campajola, C., Mottes, F., Restocchi, V., Vallarano, N., Squartini, T. & Tessone, C. J.The evolving liaisons between the transaction networks of Bitcoin and its price dynamics, arXiv:1907.03577 (2019).

[CR36] 36.Caldarelli Guido, De Nicola Rocco, Petrocchi Marinella, Pratelli Manuel, Saracco Fabio. Flow of online misinformation during the peak of the COVID-19 pandemic in Italy. EPJ Data Science. 2020 doi: 10.1140/epjds/s13688-021-00289-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Saracco F, Di Clemente R, Gabrielli A, Squartini T. Randomizing bipartite networks: the case of the World Trade Web. Sci. Rep. 2015;5:10595. doi: 10.1038/srep10595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Mastrandrea R, Squartini T, Fagiolo G, Garlaschelli D. Intensive and extensive biases in economic networks: reconstructing world trade. New J. Phys. 2014;16:043022. doi: 10.1088/1367-2630/16/4/043022. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Garlaschelli D, Loffredo MI. Generalized bose-fermi statistics and structural correlations in weighted networks. Phys. Rev. Lett. 2009;102(3):038701. doi: 10.1103/PhysRevLett.102.038701. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Mastrandrea R, Squartini T, Fagiolo G, Garlaschelli D. Reconstructing the world trade multiplex: the role of intensive and extensive biases. Phys. Rev. E. 2014;90(6):062804. doi: 10.1103/PhysRevE.90.062804. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Gleditsch K. Expanded trade and GDP data. J. Conflict Resol. 2002;46:712–24. doi: 10.1177/0022002702046005006. [DOI] [Google Scholar]

[CR42] 42.Iori G, De Masi G, Precup OV, Gabbi G, Caldarelli G. A network analysis of the Italian overnight money market. J. Econ. Dyn. Control. 2006;32(1):259–278. doi: 10.1016/j.jedc.2007.01.032. [DOI] [Google Scholar]

[CR43] 43.Parisi F, Squartini T, Garlaschelli D. A faster horse on a safer trail: generalized inference for the efficient reconstruction of weighted networks. New J. Phys. 2020;22:053053. doi: 10.1088/1367-2630/ab74a7. [DOI] [Google Scholar]

[CR44] 44.Cimini G, Squartini T, Gabrielli A, Garlaschelli D. Estimating topological properties of weighted networks from limited information. Phys. Rev. E. 2015;92:040802. doi: 10.1103/PhysRevE.92.040802. [DOI] [PubMed] [Google Scholar]

[CR45] 45.Cimini G, Squartini T, Gabrielli A, Garlaschelli D. Systemic risk analysis on reconstructed economic and financial networks. Sci. Rep. 2015;5:15758. doi: 10.1038/srep15758. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints

Nicolò Vallarano

Matteo Bruno

Emiliano Marchese

Giuseppe Trapani

Fabio Saracco

Giulio Cimini

Mario Zanon

Tiziano Squartini

Abstract

Introduction

General theory

The Maximum Entropy Principle

The maximum likelihood principle

Figure 1.

Combining the MEP and the MLP principles

Optimization algorithms for non-linear problems

Applications

Figure 2.

Figure 3.

UBCM: binary undirected graphs with given degree sequence

Table 1.

DBCM: binary directed graphs with given in-degree and out-degree sequences

Table 2.

BiCM: bipartite binary undirected graphs with given degree sequences

Table 3.

UECM: weighted undirected graphs with given strengths and degrees

Table 4.

DECM: weighted directed graphs with given strengths and degrees

Table 5.

Two-step models for undirected and directed networks

Table 6.

Table 7.

Discussion

Table 8.

Conclusions

Supplementary Information

Acknowledgements

Appendix A: computing the Hessian matrix

UBCM: binary undirected graphs with given degree sequence

DBCM: binary directed graphs with given in-degree and out-degree sequences

UECM: weighted undirected graphs with given strengths and degrees

DECM: weighted directed graphs with given strengths and degrees

Two-step models for undirected and directed networks

Appendix B: a note on the change of variables

Appendix C: fixed point method in the multivariate case

Author contributions

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases