Abstract
We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. These algorithms aim to reveal directed periodic and linear hierarchies, respectively. We show that reordering nodes using the two algorithms, or mapping them onto a specified lattice, is associated with new classes of directed random graph models. Using this random graph setting, we are able to compare the two algorithms on a given network and quantify which structure is more likely to be present. We illustrate the approach on synthetic and real networks, and discuss practical implementation issues.
Keywords: directed graph, spectral methods, network model, community structure, graph embedding, graph Laplacian
1. Motivation
Uncovering structure by clustering or reordering nodes is an important and widely studied topic in network science [1,2]. The issue is especially challenging if we move from undirected to directed networks, because there is a greater variety of possible structures. For example, even a simple motif of three connected nodes has 13 distinct forms [3, fig. 1a]. Moreover, when spectral methods are employed, directed edges lead to asymmetric eigenproblems [4–7]. Our objective in this work is to study spectral (Laplacian-based) methods for directed networks that aim to reveal clustered, directed, hierarchical structure; that is, groups of nodes that are related because, when visualized appropriately, one group is seen to have links that are directed towards the next group. This hierarchy may be periodic or linear, depending on whether there are well-defined start and end groups. Figure 1a,b illustrates the two cases. Mapping a network to a linear structure may help us understand the upstreamness and downstreamness of nodes, which is useful, for example, in the study of cascading effects such as social or financial contagion [8]. Similarly, periodic hierarchies have been associated with sustainability and risk management issues in commerce [9], and also with the existence of echo chambers in online social media [10].
Figure 1.
Directed networks with (a) periodic hierarchy (edges point from nodes in one cluster to nodes in the next cluster, counterclockwise) and (b) linear hierarchy (edges point from nodes in one level to nodes in the next highest level). Node colours indicate the three clusters.
Of course, on real data, these structures may not be so pronounced; hence in addition to visualizing the reordered network, we are interested in quantifying the relative strength of each type of signal. Laplacian-based methods are often motivated from the viewpoint of optimizing an objective function. This work focuses on two such methods. Minimizing frustration leads to the magnetic Laplacian which may be used to reveal periodic hierarchy [5,11]. Minimizing trophic incoherence leads to what we call the trophic Laplacian, which may be used to reveal linear hierarchy [6]. We will exploit the idea of associating a spectral method with a generative random graph model. This in turn allows us to compare the outputs from spectral methods based on the likelihood of the associated random graph. This connection was proposed in [12] to show that the standard spectral method for undirected networks is equivalent to maximum-likelihood optimization assuming a class of range-dependent random graphs (RDRGs) introduced in [13]. The idea was further pursued in [14], where a likelihood ratio test was developed to determine whether a network with RDRG structure is more linear or periodic.
The main contributions of this work are as follows.
-
—
We propose two new directed random graphs models. One model has the unusual property that the probability of an i → j connection is not independent of the probability of the reciprocated j → i connection.
-
—
We establish connections between these random graph models and algorithms from [6,11] that use the magnetic Laplacian and trophic Laplacian, respectively, by showing that reordering nodes or mapping them onto a specific lattice structure using these algorithms is equivalent to maximizing the likelihood that the network is generated by the models proposed.
-
—
We show that by calibrating a given network to both models, it is possible to quantify the relative presence of periodic and linear hierarchical structures using a likelihood ratio.
-
—
We illustrate the approach on synthetic and real networks.
The rest of the paper is organized as follows. In the next section, we introduce the magnetic and trophic Laplacian algorithms. Section 3 defines the new classes of random directed graphs and establishes their connection to these spectral methods. Illustrative numerical results on synthetic networks are given in §4, and in §5, we show results on real networks from a range of applications areas. We finish with a brief discussion in §6.
2. Magnetic and trophic Laplacians
2.1. Notation
We consider an unweighted directed graph G = (V, E) with node set V and edge set E, with no self-loops. The adjacency matrix A is n × n with Aij = 1 if the edge i → j is in E, and Aij = 0 otherwise. It is convenient to define the symmetrized adjacency matrix W(s) = (A + AT)/2. The symmetrized degree matrix D is diagonal with Dii = di, where is the average of the in-degree and out-degree of node i. Later, we will consider weighted networks for which each edge i → j has associated with it a non-negative weight wij. In this case, we let Aij = wij. We use i to denote , and we write xH to denote the conjugate transpose of a vector . We use to denote the set of all permutation vectors, that is, all vectors in with distinct components given by the integers 1, 2, …, n.
2.2. Spectral methods for directed networks
Spectral methods explore properties of graphs through the eigenvalues and eigenvectors of associated matrices [1,2,15,16]. In the undirected case, the standard graph Laplacian L = D − A is widely used for clustering and reordering, along with normalized variants. The directed case has received less attention; however, several extensions of the standard Laplacian have been proposed [7]. We focus on two spectral methods for directed networks, which are discussed in the next two subsections: the magnetic Laplacian algorithm, which reveals periodic flow structures [5,11], and the trophic Laplacian algorithm, which reveals linear hierarchical structures [6]. We choose to study these two algorithms because they have an optimization formulation and, as we show in §3, may be interpreted in terms of random graph models. Here, we briefly mention two other related techniques that do not fit naturally into this framework. The Hermitian matrix method groups nodes into clusters with a strong imbalance of flow between clusters [4]. This approach constructs a skew-symmetric matrix that emphasizes net flow between pairs of nodes but ignores reciprocal edges. A spectral clustering algorithm motivated by random walks was derived in [17] leading to a graph Laplacian for directed networks that was proposed earlier in [18].
2.3. The magnetic Laplacian
Given a network and a vector of angles θ = (θ1, θ2, …, θn)T in [0, 2π), we may define the corresponding frustration
| 2.1 |
where δij = −2πgαij with . Here, αij = 0 if the edge between i and j is reciprocated, that is Aij = Aji = 1; αij = 1 if the edge i → j is unreciprocated, that is Aij = 1 and Aji = 0; and αij = −1 if the edge j → i is unreciprocated, that is Aij = 0 and Aji = 1. For convenience, we also set αij = 0 if i and j are not connected. To understand the definition (2.1), suppose that for a given graph we wish to choose angles that produce low frustration. Each term in (2.1) can make a positive contribution to the frustration if ; that is, if i and j are involved in at least one edge. In this case, if there is an edge from i to j that is not reciprocated, then we can force this term to be zero by choosing θj = θi + 2πg. If the edge is reciprocated, then we can force the term to be zero by choosing θj = θi. Hence, intuitively, choosing angles to minimize the frustration can be viewed as mapping the nodes into directed clusters on the unit circle in such a way that (a) nodes in the same cluster tend to have reciprocated connections and (b) unreciprocated edges tend to point from source nodes in one cluster to target nodes in the next cluster, periodically. Setting the parameter g = 1/k for some positive integer k indicates that we are looking for k directed clusters.
On a real network, it is unlikely that the frustration (2.1) can be reduced to zero, but it is of interest to find a set of angles that give a minimum value. This minimization problem is closely related to the angular synchronization problem [19,20], which estimates angles from noisy measurements of their phase differences . Moreover, we note that for visualization purposes it makes sense to reorder the rows and columns of the adjacency matrix based on the set of angles that minimizes the frustration. We also note that in [11] the expression in (2.1) for the frustration is normalized through a division by . This is immaterial for our purposes, since that denominator is independent of the choice of θ.
The frustration (2.1) is connected to the magnetic Laplacian, which is defined as follows, where denotes the elementwise, or Hadamard, product between matrices of the same dimension; that is, .
Definition 2.1. —
Given , the magnetic Laplacian L(g) [5,11] is defined as
where . Here, the transporter matrix T(g) assigns a rotation to each edge according to its direction.
It is straightforward to show that L(g) is a Hermitian matrix. When g = 0 and the graph is undirected, the magnetic Laplacian reduces to the standard graph Laplacian.
The following result, which is implicit in [5,11], shows that the frustration (2.1) may be written as a quadratic form involving the magnetic Laplacian.
Theorem 2.2. —
Let be such that , then
2.2
Appealing to the Rayleigh–Ritz theorem [21] the quadratic form on the left-hand side of (2.2) is minimized over all with by taking to be an eigenvector corresponding to the smallest eigenvalue of the magnetic Laplacian. Now, such an eigenvector will not generally be proportional to a vector with components of the form . However, a useful heuristic is to force this relationship in a componentwise sense; that is, to assign to each θj the phase angle of ψj, effectively solving a relaxed version of the desired minimization problem. This leads to algorithm 1, as used in [11].
Algorithm 1.
Magnetic Laplacian algorithm.
| Result: Phase angles of nodes |
| Input adjacency matrix ; |
| Symmetrize adjacency matrix ; |
| Calculate degree matrix ; |
| Construct transporter ; |
| Calculate Magnetic Laplacian ; |
| Compute eigenvectors and associated eigenvalues; |
| Calculate phase angles using eigenvector associated with the smallest eigenvalue; |
| Reorder nodes with or visualize with |
2.4. The trophic Laplacian
The idea of discovering a linear directed hierarchy arises in many contexts where edges represent dominance or approval, including the ranking of sports teams [22] and Web pages [23]. A particularly well-defined case is the quantification of trophic levels in food webs, where each directed edge represents a consumer–resource relationship [24–26]. We focus here on the approach in [6], where the aim is to assign a trophic level hi to each node i such that along any directed edge the trophic level increases by one. This motivates the minimization of the trophic incoherence
| 2.3 |
Denoting the total weight of node i as and the imbalance as , the trophic level vector that minimizes the trophic incoherence solves the linear system of equations
| 2.4 |
where , and the solution to (2.4) is unique up to a constant shift [6]. Since it employs a Laplacian-style matrix, , we refer to it as the trophic Laplacian algorithm; see algorithm 2.
Algorithm 2.
Trophic Laplacian algorithm.
| Result: The trophic levels |
| Input adjacency matrix ; |
| Calculate the node weights ; |
| Calculate the node imbalances ; |
| Calculate the trophic Laplacian ; |
| Solve the linear system (2.4); |
| Reorder or visualize nodes using |
3. Random graph interpretation
In this section, we associate two new random graph models with the magnetic and trophic Laplacian algorithms, using a similar approach to the work in [12]. After establishing these connections, we proceed as in [14] and propose a maximum-likelihood test to compare the two models on a given network.
3.1. The directed pRDRG model
Given a set of phase angles , we will define a model for unweighted, directed random graphs. The model generates connections between each pair of distinct nodes i and j with four possible outcomes—a pair of reciprocated edges, an unreciprocated edge from i to j, an unreciprocated edge from j to i, or no edges—as follows:
| 3.1 |
| 3.2 |
| 3.3 |
| 3.4 |
where f, q and l are functions that define the model, and, of course, they must be chosen such that all probabilities lie between zero and one. We emphasize that this model has a feature that distinguishes it from typical random graph models, including directed Erdős–Rényi and small-world style versions [27]: the probability of the edge i → j is not independent of the probability the edge j → i, in general.
We are interested here in the inverse problem where we are given a graph and a model (3.1)–(3.4), and we wish to infer the phase angles. This task arises naturally when the nodes are supplied in some arbitrary order. We will assume that the phase angles are to be assigned values from a discrete set ; that is, we must set , where p is a permutation vector. This setting includes the cases of (directed) clustering and reordering. For example, with n = 12, we could specify ν1 = ν2 = ν3 = 0, ν4 = ν5 = ν6 = π/2, ν7 = ν8 = ν9 = π, and ν10 = ν11 = ν12 = 3π/2, in order to assign the nodes to four directed clusters of equal size. Alternatively, νi = (i − 1)2π/12 would assign the nodes to equally spaced phase angles, as shown in figure 2a, as a means to reorder the graph. The following theorem shows that solving this type of inverse problem for suitable f, q and l is equivalent to minimizing the frustration.
Figure 2.
(a) Points uniformly distributed on the unit circle and (b) a sphere.
Theorem 3.1. —
Suppose is constrained to take values such that , where p is a permutation vector. Then minimizing the frustration η(θ) in (2.1) over all such θ is equivalent to maximizing the likelihood that the graph came from a model of the form (3.1)–(3.4) in the case where
with βij = θi − θj and normalization constant
for any positive constant γ.
Proof. —
We first note that, since δji = −δij, for i ≠ j, and , we may express η(θ) (equation (2.1)) in terms of a sum over ordered pairs:
3.5 Then, distinguishing between the three different ways in which each i and j may be connected, we have
3.6
3.7 The likelihood L of the graph G from a model of the form (3.1)–(3.4) is given by
which we may rewrite as
The final factor on the right-hand side, which is the probability of the null graph, takes the same value for any such that , since each ordered pair of arguments appears exactly once. We may therefore ignore this factor when maximizing the likelihood. Then, taking the logarithm and negating, we see that maximizing the likelihood is equivalent to minimizing the expression
3.8
3.9
3.10 Comparing terms in (3.8)–(3.10) and (3.6)–(3.7), we see that the two minimization problems are equivalent if
where we may choose any positive constant γ since the minimization problems are scale invariant. Solving for f, q and l as functions of θi and θj we arrive at the model in the statement of the theorem. ▪
For the model in theorem 3.1, the probability of an edge from node i to node j depends on the phase difference βij = θi − θj, the decay rate γ, and the parameter g. We see that γ determines how rapidly the edge probability varies with the phase difference. In the extreme case when γ = 0, we obtain f(θi, θj) = q(θi, θj) = l(θi, θj) = 1/4, and thus the model reduces to a conditional Erdős–Rényi form. In addition, as γ increases the graph generally becomes more sparse. This is because the likelihood of disconnection, exp [2γ(1 − cos(θi − θj))]/Zij, is greater than or equal to that of the other cases.
We note that having applied the magnetic Laplacian algorithm to estimate θ, there are two straightforward approaches to estimating γ. One way is to maximize the graph likelihood over γ > 0. Another is to choose γ so that the expected edge density from the random graph model matches the edge density of the given network. We illustrate these approaches in §4.
Remark 3.2. —
Since the edge probabilities are functions of the phase differences and have a periodicity of 2π, this model resembles the periodic range-dependent random graph (pRDRG) model in [14], which generates an undirected edge between i and j with probability f(min{|j − i|, n − |j − i|}) for a given decay function f. We will therefore use the term directed periodic range-dependent random graph model (directed pRDRG) to describe the model in theorem 3.1.
3.2. The trophic range-dependent model
Now, given a set of trophic levels , we define an unweighted, directed random graph model where
| 3.11 |
and
| 3.12 |
for some function f. Here, the probability of an edge i → j is independent of the probability of the edge j → i.
Following our treatment of the directed pRDRG case, we are now interested in the inverse problem where we are given a graph and the model (3.11)–(3.12), and we wish to infer the trophic levels. We will assume that the trophic levels are to be assigned values from a discrete set ; that is, we must set , where p is a permutation vector. This setting includes the cases of assignment of nodes to trophic levels of specified size; for example, with n = 12, we could set ν1 = ν2 = ν3 = 1, ν4 = ν5 = ν6 = 2, ν7 = ν8 = ν9 = 3 and ν10 = ν11 = ν12 = 4, in order to assign the nodes to four equal levels. Alternatively, νi = i would assign each node to its own level, which is equivalent to reordering the nodes. The following theorem shows that solving this type of inverse problem for suitable f is equivalent to minimizing the trophic incoherence.
Theorem 3.3. —
Suppose is constrained to take values such that , where p is a permutation vector. Then minimizing the trophic incoherence F(h) in (2.3) over all such h is equivalent to maximizing the likelihood that the graph came from a model of the form (3.11)–(3.12) in the case where
for any positive γ.
Proof. —
Noting that the denominator in (2.3) is independent of the choice of h, this result is a special case of theorem 3.5 below, with I(hi, hj) = (hj − hi − 1)2. ▪
For the model in theorem 3.3, the probability of an edge i → j is a function of the shifted, directed, squared difference in levels, (hj − hi − 1)2. The larger this value, the lower the probability. Within the same level, where hi = hj, the probability is . The edge probability takes its maximum value of 1/2 when hj − hi = 1, that is, when the edge starts at one level and finishes at the next highest level. We also see that the overall expected edge density is always smaller than 1/2. Across different levels, where hi ≠ hj, the edge i → j and the edge j → i are not generated with the same probability. If |hj − hi − 1| < |hi − hj − 1|, the edge i → j is more likely than j → i. The two edge probabilities are equal if and only if hi = hj. Therefore, this model could be interpreted as a combination of an Erdős–Rényi model within the same level and a periodic range-dependent model across different levels.
The parameter γ controls the decay rate of the likelihood as the shifted, directed, squared difference in levels increases. When hj − hi = 1, γ plays no role. If γ = 0, the model reduces to Erdős–Rényi with an edge probability of 1/2. As γ → ∞, the edge probability tends to zero if hj − hi ≠ 1. In this case, the model will generate a multipartite graph where edges are only possible in one direction between adjacent levels, and this happens with probability 1/2. As mentioned previously in §3.1 and illustrated in §4, γ can be fitted from a maximum likelihood estimate or by matching the edge density.
We note that the definition of trophic incoherence in (2.3) and the resulting trophic Laplacian algorithm make sense for a non-negatively weighted graph, in which case we have the following result. Here, to be concrete we assume that weights lie strictly between zero and one. Similar results can be obtained for weights from a discrete distribution.
Theorem 3.4. —
Suppose is constrained to take values such that , where p is a permutation vector. Then minimizing the trophic incoherence F(h) in (2.3) over all such h for a weighted graph with weights in (0, 1) is equivalent to maximizing the likelihood that the graph came from a model where each edge weight Aij is independent with density function
3.13 for any positive γ, where is a normalization factor.
Proof. —
This is a special case of theorem 3.6 below, where I(hi, hj) = (hj − hi − 1)2. ▪
3.3. Generalized random graph model
The results in §§3.1 and 3.2 exploit the form of the objective function: the sum over all edges of a kernel function can be viewed as the sum of log-likelihoods. This shows that the minimization problem is equivalent to maximizing the likelihood of an associated random graph model, in the setting where we assign nodes to a discrete set of scalar values. The restriction to discrete values is used in the proofs to make the probability of the null graph constant. However, we emphasize that in practice the relaxed versions of the optimization problems, which are solved by the two algorithms, do not have this restriction. The magnetic Laplacian algorithm produces real-valued phase angles and the trophic Laplacian algorithm produces real-valued trophic levels.
We may extend the connection in theorem 3.3 to the case of higher dimensional node attributes, that is, where we wish to associate each node with a discrete vector from a set , where each for some d ≥ 1. This setting arises, for example, if we wish to visualize the network in higher dimension; a natural extension of the ring structure would be to place nodes at regularly spaced points on the surface of the unit sphere, see figure 2b, which we produced with the algorithm in [28]. The next result generalizes theorem 3.3 to this case.
Theorem 3.5. —
Suppose we have an unweighted directed graph with adjacency matrix A and a kernel function , and suppose that we are free to assign elements to values from the set ; that is, we allow where p is a permutation vector. Then minimizing
3.14 over all such is equivalent to maximizing the likelihood that the graph came from a model where the (independent) probability of the edge i → j is
3.15 for any positive γ.
Proof. —
Given , the probability of generating a graph G from the model stated in the theorem is
The second factor on the right-hand side, the probability of the null graph, does not depend on the choice of . So we may ignore this factor, and after taking logs and negating we arrive at the equivalent problem of minimizing
3.16 Comparing (3.16) and (3.14), we see that two minimization problems have the same solution when
for any positive γ, and the result follows. ▪
For the model in theorem 3.5, given the edge i → j appears according to a Bernoulli distribution with probability , and hence with variance
When the probability is 1/2 and the variance takes its largest value, 1/4. The edge probability is symmetric about i and j if and only if the function I is symmetric about its arguments. In the case of squared Euclidean distance, , and an undirected graph, the relaxed version of the minimization problem is solved by taking d eigenvectors corresponding to the smallest eigenvalues of the standard graph Laplacian.
For completeness, we now state and prove a weighted analogue of theorem 3.5 assuming that weights lie strictly between zero and one. Discrete-valued weights may be dealt with similarly.
Theorem 3.6. —
Suppose may take values from the given set ; that is, , where p is a permutation vector. Then, given a weighted graph with weights in (0, 1), minimizing the expression (3.14) over all such is equivalent to maximizing the likelihood that the graph came from a model where Aij has (independent) density
3.17 for any positive γ, where
is a normalization factor.
Proof. —
It is straightforward to check that the normalization factor Zij ensures
Now the product over all pairs ∏i,j Zij is independent of the choice of permutation vector p. Hence, under the model defined in the theorem, maximizing the likelihood of the graph G is equivalent to maximizing ∏i,j fij(Aij). After taking logarithms and negating, we see that the choice (3.17) allows us to match (3.14). ▪
Remark 3.7. —
It is natural to ask whether the frustration (2.1) fits into the form (3.14), and hence has an associated random graph model of the form (3.15). We see from (3.5) that the frustration may be written
However, the factor depends (through δij) on Aij, and hence we do not have an expression of the form (3.14). This explains why a new type of model, with conditional dependence between the i → j and j → i connections, was needed for theorem 3.1.
3.4. Model comparison
The random graph models appearing in §3 capture the characteristics of linear and periodic directed hierarchies. Hence it may be of interest (a) to analyse properties of these models and (b) to use these models to evaluate the performance of computational algorithms. However, in the remainder of this work we focus on a follow-on topic of more direct practical significance. The magnetic Laplacian and trophic Laplacian algorithms allow us to compute node attributes θ and in for a given graph, leading to unsupervised node ordering. The main computation required in this step is finding dominant eigenvector–eigenvalue pairs. Assuming that the network is sparse (each node has an O(1) degree) and that the power method gives the required accuracy in a finite number of iterations, this is an computation. Motivated by theorems 3.1 and 3.3, we may then compute the likelihood of the graph for this choice of attributes, which has a complexity of . By comparing likelihoods, we may quantify which underlying structure is best supported by the data. An extra consideration is that both random graph models involve a free parameter, γ > 0, which is needed to evaluate the likelihood. As discussed earlier, one option is to fit γ to the data, for example by matching the expected edge density from the model with the edge density of the given graph. However, based on our computational tests, we found that a more reliable approach was to choose the γ that maximizes the likelihood, once the node attributes were available; see §§4 and 5 for examples. Our overall proposed workflow for model comparison is summarized in algorithm 3.
Algorithm 3.
Model comparison.
| Result: Comparison of possible graph structures |
| Input adjacency matrix ; |
| for Candidate spectral methods do |
| Compute node attributes (in our case with magnetic and trophic Laplacian algorithms); |
| Derive the associated random graph model; |
| Calculate maximum likelihood over ; |
| end |
| Report or compare maximum likelihoods |
4. Results on synthetic networks
In this section, we demonstrate the model comparison workflow on synthetic networks. These networks are generated using the directed pRDRG model and the trophic RDRG model. Hence, we have a ‘ground truth’ concerning whether a network is more linear or periodic. Note that the magnetic Laplacian algorithm and associated random graph model have a parameter g that controls the spacing between clusters. Therefore, when using the magnetic Laplacian algorithm our first step is to select the parameter g based on the maximum likelihood of the graph.
4.1. Directed pRDRG model
We generate a synthetic network using the directed pRDRG model with K clusters of size m, and hence n = m K nodes. An array of angles is created, forming evenly spaced clusters C1, C2, …, CK. This is achieved by letting θi = (2π(l − 1)/K) + σ if i ∈ Cl, where σ ∼ unif(−a, a) is added noise. We then construct the adjacency matrix according to the probabilities in theorem 3.1 with g = 1/K. We choose m = 100, K = 5, γ = 5 and a = 0.2 and the corresponding adjacency matrix is shown in figure 3a.
Figure 3.
Magnetic Laplacian and trophic Laplacian algorithms applied to a synthetic directed pRDRG. (a) Input adjacency matrix, (b) magnetic Laplacian reordering, (c) trophic Laplacian reordering, (d) likelihood of directed pRDRG, (e) estimated θ and (f) model comparison.
The magnetic Laplacian algorithm is then applied to the adjacency matrix to estimate phase angles and reorder the nodes. The reordered adjacency matrix (figure 3b) recovers the original structure. The trophic Laplacian algorithm is also applied to estimate the trophic level of each node. Figure 3c shows the adjacency matrix reordered by the estimated trophic levels, which hides the original pattern. Intuitively, the trophic Laplacian algorithm is unable to distinguish between these nodes since there is no clear ‘lowest’ or ‘highest’ level among the directed clusters.
Figure 3d illustrates how the optimal parameter g is selected. The plots show the likelihood that the network is generated by a directed pRDRG model for , assuming we are interested in structures with at most 6 directed clusters. We see that has the highest maximum likelihood, as expected. Consequently, we choose g = 1/5 for the magnetic Laplacian algorithm. In addition for this value of g, we plot in figure 3e the phase angles estimated with the magnetic Laplacian algorithm against the true phase angles. The linear relationship confirms that the algorithm recovers the five clusters in the presence of noise.
We finally in figure 3f compare the likelihood of a directed pRDRG against the likelihood of a trophic RDRG. Both likelihoods are calculated using several test points for γ. The highest points are highlighted with circles and they correspond to the maximum-likelihood estimators (MLE) for γ. Not surprisingly, in this case, the magnetic Laplacian algorithm achieves a higher maximum. Asterisks highlight the point estimates arising when the expected number of edges is matched to the actual number of edges. We see here, and also observed in similar experiments, that the maximum-likelihood estimate for γ produces a more accurate result. We also found (numerical experiments not presented here) that the accuracy of both types of γ estimates improves as n increases when using the magnetic Laplacian algorithm.
4.2. The trophic RDRG model
Following on from the previous subsection, we now generate synthetic data by simulating the trophic RDRG model with levels C1, C2, …, CK, where each level has m nodes. In particular, we generate an array of trophic indices , where the total number of nodes is n = m K. We let hi = l + σ if i ∈ Cl for 1 ≤ l ≤ K, where σ ∼ unif(−a, a) is added noise. The edges are then generated according to the probabilities in theorem 3.3. In the following example, we use K = 5, m = 100, a = 0.2 and γ = 5. This generates a network with five clusters forming a linear directed flow, as shown in figure 4a.
Figure 4.
Magnetic Laplacian and trophic Laplacian algorithms applied to a synthetic trophic RDRG. (a) Input adjacency matrix, (b) magnetic Laplacian reordering, (c) trophic Laplacian reordering, (d) likelihood of directed pRDRG, (e) estimated trophic level, (f) model comparison.
We see in figure 4c that the trophic Laplacian algorithm recovers the underlying pattern. Figure 4b shows that the magnetic Laplacian algorithm also gives adjacent locations to nodes in the same cluster, and places the clusters in order, modulo a ‘wrap-around’ effect that arises due to its periodic nature. Figure 4d suggests that the optimal magnetic Laplacian parameter is g = 1/6. For this case, it is reasonable that g = 1/K is not identified, since the disconnection between the first and the last cluster contradicts the structure of the directed pRDRG model.
The trophic levels estimated using the trophic Laplacian are consistent with the true trophic levels, as shown by the linear pattern in figure 4e. As expected, the trophic Laplacian produces a higher maximum likelihood for this network (figure 4f) and a more accurate MLE and point estimate for γ. We observe (in similar experiments not presented here) that when using the trophic Laplacian, the accuracy of both estimates increases using the trophic Laplacian.
5. Results on real networks
We now discuss practical use cases for the model comparison tool on a range of real networks. We emphasize that the tool is not designed to discover whether a given directed network has linear or directed hierarchical structure; rather it aims to quantify which of the two structures is best supported by the data in a relative sense. Since both models under investigation assume no self-loops, we discard these if they are present in the data. Following common practice, we also preprocess by retaining the largest strongly connected component to emphasize directed cycles. This ensures that any pair of nodes can be connected through a sequence of directed edges. However, when the strongly connected component contains too few nodes, we analyse the largest weakly connected component instead.
We give details on four networks, covering examples of the two cases where linear and periodic structure dominates. For the first two networks, we show network visualizations to illustrate the results further. In §5.5, we present summary results over 15 networks.
5.1. Food web
In the Florida Bay food web1 [29], nodes are components of the system, and unweighted directed edges represent carbon transfer from the source nodes to the target nodes [30], which usually means that the latter feed on the former. Besides organisms, the nodes also contain non-living components, such as carbon dissolved in the water column. Since we are more interested in the relationship between organisms, we remove those non-living components from the network. We analyse the largest strongly connected component of the network, which comprises 12 nodes and 28 edges.
We estimate the phase angles of each node using the magnetic Laplacian algorithm based on the optimal choice g = 1/3 (figure 5a). Figure 5b compares the likelihood of the food web being generated by the directed pRDRG model with the likelihood of it being generated by the trophic RDRG model, as γ varies. The directed pRDRG model achieves a higher maximum likelihood, suggesting that the structure is more periodic than linear. In figure 5c, the heights of the nodes correspond to their estimated trophic levels on a vertical axis. We see that 22 edges point upwards, these are shown in blue. There are six downward edges, highlighted in red, which violate the trophic structure. The magnetic Laplacian mapping in figure 5d arranges 26 edges in a counterclockwise direction, shown in blue, with 2 edges, shown in red, violating the structure and pointing in the reverse orientation.
Figure 5.
Results for the Florida Bay food web. (a) Likelihood of directed pRDRG, (b) model comparison, (c) estimated trophic level, (d) magnetic
eigenmap, (e) trophic Laplacian reordering and (f) magnetic Laplacian reordering.
With g = 1/3, the magnetic Laplacian mapping is encouraging cycles in the food chain, and these are visible in figure 5d, notably between members of three categories: (i) flatfish and other demersal fishes; (ii) lizardfish and eels; and (iii) toadfish and brotalus. Another noticeable distinction is that the magnetic Laplacian mapping positions eels close to lizardfish, and flatfish near other demersal fishes by accounting for the reciprocal edges, while the trophic Laplacian mapping places them further apart. In figure 5e,f, we show the reordered adjacency matrix arising from the two algorithms.
5.2. Influence matrix
The influence matrix we study quantifies the influence of selected system factors in the Motueka Catchment of New Zealand [31]. The original influence matrix consists of integer scores between 0 and 5, measuring to what extent the row factors influence the column factors, where a bigger value represents a stronger impact. The system factors and influence scores were developed by pooling the views of local residents. To convert to an unweighted network, we binarize the weights by keeping only the edges between each factor and the factor(s) it influences most strongly. We then select the largest strongly connected component, which comprises 14 nodes and 35 edges.
The optimal parameter for the magnetic Laplacian is g = 1/4 (figure 6a). The mapping from the magnetic Laplacian has a higher maximum likelihood than the trophic Laplacian mapping, indicating a more periodic structure (figure 6b). The trophic Laplacian mapping in figure 6c aims to reveal a hierarchical influence structure. Here, scientific research and economic inputs are assigned lower trophic levels, suggesting that they are the fundamental influencers. The labour market is placed at the top, indicating that it tends to be influenced by other factors. However, there are eight edges, highlighted in red, that point downwards, violating the directed linear structure.
Figure 6.
Results for the Motueka catchment influence matrix. (a) Likelihood of directed pRDRG, (b) model comparison, (c) estimated trophic level, (d) magnetic eigenmap, (f) trophic Laplacian reordering and (g) magnetic Laplacian reordering.
On the other hand, the magnetic Laplacian mapping in figure 6d aims to reveal four directed clusters with phase angles of approximately 0, π/2, π, 3π/2. We highlight the nodes corresponding to ecological factors in red and socio-economic factors in blue. The cluster near π/2 with 6 nodes contains a combination of ecological and socio-economic factors, and includes 6 reciprocal edges between ecological factors and socio-economic factors. Adjacency matrix reorderings are shown in figure 6f,g. Overall, the pattern agrees with the conceptual schematic model proposed in [31, fig. 5a], which we have reproduced in figure 7. This model posits that ecological factors exert influence on socio-economic factors, which in turn influence ecological factors, while the ecological system also influences itself.
Figure 7.

Influence matrix schematic graph, based on [31, fig. 5a].
5.3. Yeast transcriptional regulation network
We now analyse a gene transcriptional regulation network2 [29] for a type of yeast called S. cerevisiae [32], where a node represents an operon made up of a group of genes in mRNA. An edge from operon i to j indicates that the transcriptional factor encoded by j regulates i. The original network is directed and signed, with signs indicating activation and deactivation. Here, we ignore the signs and only consider the connectivity pattern. Since the largest strongly connected component has very few nodes, we take the largest weakly connected component, which comprises 664 nodes and 1078 edges.
This is a very sparse network and consequently the log-likelihood of the directed pRDRG (figure 8a) keeps increasing as a function of the decay rate parameter γ in the range we tested. We select g = 1/3 as the optimal parameter for the magnetic Laplacian, and compare the log-likelihood of two models in figure 8b. This time the trophic version achieves a higher maximum likelihood, favouring a linear structure.
Figure 8.
Results for a yeast transcriptional regulation network. (a) Likelihood of directed pRDRG and (b) model comparison.
5.4. Caenorhabditis elegans frontal neural network
Caenorhabditis elegans is the only organism whose neural network has been fully mapped. The neural network of C. elegans3 [29] is unweighted and directed, representing connections between neurons and synapses [33]. We investigate its largest strongly connected component with 109 nodes and 637 edges. The optimal value for the parameter g among the test points is g = 1/5 (figure 9a). The trophic Laplacian algorithm achieves a higher maximum likelihood than the magnetic Laplacian algorithm using figure 9b. This preference for a linear directed structure is consistent with the tube-like shape of the organism [34].
Figure 9.
Caenorhabditis elegans frontal neural network. (a) Likelihood of directed pRDRG and (b) model comparison.
5.5. Other real networks
A summary of further real-world network comparisons is given in table 1.
Table 1.
Comparison summary statistics. Periodic (linear) directed structure is found to be preferred for networks in the first 8 (last 7) rows.
| dataset | nodes | edges | g | ln(PpRDRG/PTrophic) |
|---|---|---|---|---|
| directed pRDRG (s) | 500 | 49277 | 1/5 | 5.99 × 10+04 |
| food web (s) [30] | 12 | 28 | 1/3 | 1.17 × 10+01 |
| influence matrix (s) [31] | 14 | 35 | 1/4 | 1.72 × 10+01 |
| US migration (s)a | 51 | 729 | 1/6 | 5.03 × 10+02 |
| US IO (s)b | 31 | 299 | 1/6 | 5.67 × 10+01 |
| trade (s)c | 17 | 85 | 1/6 | 2.02 × 10+01 |
| transportation (s)d [29,35] | 456 | 71959 | 1/6 | 4.66 × 10+04 |
| flight (s)e | 227 | 23113 | 1/6 | 7.22 × 10+03 |
| trophic level graph (w) | 500 | 19956 | 1/6 | −1.63 × 10+04 |
| C. elegans (s) [33] | 109 | 637 | 1/6 | −4.74 × 10+02 |
| yeast (w) [32] | 664 | 1078 | 1/3 | −6.46 × 10+04 |
| political blog (s)f [36] | 793 | 15781 | 1/5 | −3.42 × 10+04 |
| shopping basket (w)g | 27 | 84 | 1/6 | −1.35 × 10+02 |
| venue reopen (w) [37] | 13 | 19 | 1/6 | −1.82 × 10+01 |
| word adjacency (w)f [38] | 112 | 425 | 1/6 | −8.21 × 10+02 |
In the dataset column, we use (s) and (w) to indicate whether the largest strongly or weakly connected component is analysed, respectively. The fourth column specifies the optimal parameter g for the magnetic Laplacian determined through grid search among the test points g = 1/2, 1/3, 1/4, 1/5, 1/6. The decay parameter γ used for the grid search ranges from 0 to 20 with a step size of 0.5. The last column shows the logarithm of the ratio between the maximum likelihoods of the directed pRDRG and trophic models. Hence, periodic/linear structure is seen to be favoured for the networks in the first 8 rows/last 7 rows.
6. Discussion
Spectral methods can be used to extract structures from directed networks, allowing us to detect clusters, rank nodes and visualize patterns. This work exploited a natural connection between spectral methods for directed networks and generative random graph models. We showed that the magnetic Laplacian and tropic Laplacian can each be associated with a range-dependent random graph. In the magnetic Laplacian case, the new random graph model has the interesting property that the probabilities of i → j and j → i connections are not independent. Our theoretical analysis provided a workflow for quantifying the relative strength of periodic versus linear directed hierarchy, using a likelihood ratio, adding value to the standard approach of visualizing a new graph layout or reordering the adjacency matrix.
We demonstrated the model comparison workflow on synthetic networks, and also showed examples where real networks were categorized as more linear or periodic. The results illustrate the potential for the approach to reveal interesting patterns in networks from ecology, biology, social sciences and other related fields.
There are several promising directions for related future work. It would be of interest to use the likelihood ratios to compare this network feature across a well-defined category in order to address questions such as ‘are results between top chess players more or less periodic than results between top tennis players?’ and ‘does an organism that is more advanced in an evolutionary sense have more periodic connectivity in the brain?’ An extension of the comparison tool to weighted networks should also be possible; here there are notable, and perhaps application-specific, issues about how to generalize and interpret the magnetic Laplacian. Also, the comparison could be extended to include other types of structure, including stochastic block and core–periphery versions [39]. This introduces further challenges of (a) accounting for different numbers of model parameters and (b) dealing with nonlinear spectral methods. Furthermore, by introducing an appropriate null model it may be possible to quantify the presence of linear or periodic hierarchies in absolute, rather than relative, terms.
Supplementary Material
Acknowledgements
The authors thank Colin Singleton from the CountingLab for suggesting the Dunnhumby data used in table 1 and providing advice on data analysis.
Footnotes
Data accessibility
This research made use of public domain data that are available from the Internet, as indicated in the text. Code for the experiments is available at https://github.com/OpalGX/Directed-Network-Laplacians.
Authors' contributions
X.G. carried out the numerical experiments and drafted the manuscript. All authors contributed to the theoretical research, the design of numerical experiments and the completion of the manuscript. All authors have read and approved the manuscript and gave final approval for publication.
Competing interests
The authors declare that there is no conflict of interest.
Funding
X.G. acknowledges support of MAC-MIGS CDT Scholarship under EPSRC grant no. EP/S023291/1. D.J.H. was supported by EPSRC Programme grant no. EP/P020720/1.
References
- 1.Luxburg U. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 395-416. ( 10.1007/s11222-007-9033-z) [DOI] [Google Scholar]
- 2.Strang G. 2019. Linear algebra and learning from data. Wellesley, MA: Wellesley-Cambridge Press. [Google Scholar]
- 3.Benson AR, Gleich DF, Leskovec J. 2016. Higher-order organization of complex networks. Science 353, 163-166. ( 10.1126/science.aad9029) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cucuringu M, Li H, Sun H, Zanetti L. 2020. Hermitian matrices for clustering directed graphs: insights and applications. In Int. Conf. on Artificial Intelligence and Statistics, pp. 983–992. PMLR.
- 5.Fanuel M, Alaíz CM, Suykens JAK. 2017. Magnetic eigenmaps for community detection in directed networks. Phys. Rev. E 95, 022302. ( 10.1103/PhysRevE.95.022302) [DOI] [PubMed] [Google Scholar]
- 6.Mackay RS, Johnson S, Sansom B. 2020. How directed is a directed network? R. Soc. Open Sci. 7, 201138. ( 10.1098/rsos.201138) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Malliaros FD, Vazirgiannis M. 2013. Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95-142. ( 10.1016/j.physrep.2013.08.002) [DOI] [Google Scholar]
- 8.Sansom B, Johnson S, MacKay RS. 2021. Trophic incoherence drives systemic risk in financial exposure networks. Working Paper no. 39. London, UK: National Institute of Economic and Social Research. [DOI] [PMC free article] [PubMed]
- 9.Chhimwal M, Agrawal S, Kumar G. 2021. Measuring circular supply chain risk: a Bayesian network methodology. Sustainability 13, 8448. ( 10.3390/su13158448) [DOI] [Google Scholar]
- 10.Jasny L, Fisher DR. 2019. Echo chambers in climate science. Environ. Res. Commun. 1, 101003. ( 10.1088/2515-7620/ab491c) [DOI] [Google Scholar]
- 11.Fanuel M, Alaiz CM, Fernandez A, Suykens JAK. 2018. Magnetic eigenmaps for the visualization of directed networks. Appl. Comput. Harmon. Anal. 44, 189-199. ( 10.1016/j.acha.2017.01.004) [DOI] [Google Scholar]
- 12.Higham DJ. 2003. Unravelling small world networks. J. Comput. Appl. Math. 158, 61-74. ( 10.1016/S0377-0427(03)00471-0) [DOI] [Google Scholar]
- 13.Grindrod P. 2002. Range-dependent random graphs and their application to modeling large small-world proteome datasets. Phys. Rev. E 66, 066702/7-066702. ( 10.1103/PhysRevE.66.066702) [DOI] [PubMed] [Google Scholar]
- 14.Grindrod P, Higham DJ, Kalna G. 2010. Periodic reordering. IMA J. Numer. Anal. 30, 195-207. ( 10.1093/imanum/drp047) [DOI] [Google Scholar]
- 15.Chung F. 1997. Spectral graph theory. Regional Conference Series in Mathematics, no. 92. Providence, RI: American Mathematical Society.
- 16.Higham DJ. 2007. Spectral clustering and its use in bioinformatics. J. Comput. Appl. Math. 204, 25-37. ( 10.1016/j.cam.2006.04.026) [DOI] [Google Scholar]
- 17.Palmer WR, Zheng T. 2020. Spectral clustering for directed networks. In Int. Conf. on Complex Networks and Their Applications, pp. 87–99. New York, NY: Springer.
- 18.Chung F. 2005. Laplacians and the Cheeger inequality for directed graphs. Ann. Comb. 9, 1-19. ( 10.1007/s00026-005-0237-z) [DOI] [Google Scholar]
- 19.Cucuringu M, Tyagi H. 2020. An extension of the angular synchronization problem to the heterogeneous setting. (http://arxiv.org/abs/2012.14932)
- 20.Singer A. 2011. Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmonic Anal. 30, 20-36. ( 10.1016/j.acha.2010.02.001) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lütkepohl H. 1996. Handbook of matrices. Chichester, UK: Wiley. [Google Scholar]
- 22.MacKay RS. 2020. Incomplete pairwise comparison. Math. Today 132. See https://cdn.ima.org.uk/wp/wp-content/uploads/2020/07/Incomplete-Pairwise-Comparison-from-MT-Aug20.pdf.
- 23.Gleich DF. 2015. PageRank beyond the Web. SIAM Rev. 57, 321-363. ( 10.1137/140976649) [DOI] [Google Scholar]
- 24.Johnson S. 2020. Digraphs are different: why directionality matters in complex systems. J. Phys.: Complexity 1, 015003. ( 10.1088/2632-072X/ab8e2f) [DOI] [Google Scholar]
- 25.Levine S. 1980. Several measures of trophic structure applicable to complex food webs. J. Theor. Biol. 83, 195-207. ( 10.1016/0022-5193(80)90288-X) [DOI] [Google Scholar]
- 26.Moutsinas G, Shuaib C, Guo W, Jarvis S. 2019. Graph hierarchy: a novel approach to understanding hierarchical structures in complex networks. (https://arxiv.org/abs/1908.04358) [DOI] [PMC free article] [PubMed]
- 27.Kleinberg JM. 2000. Navigation in a small world. Nature 406, 845. ( 10.1038/35022643) [DOI] [PubMed] [Google Scholar]
- 28.Deserno M. 2004. How to generate equidistributed points on the surface of a sphere. Unpublished.
- 29.Leskovec J, Krevl A. 2014. SNAP datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- 30.Ulanowicz RE, DeAngelis DL. 2005. Network analysis of trophic dynamics in South Florida ecosystems. US Geol. Survey Program South Florida Ecosystem 114, 45. [Google Scholar]
- 31.Cole A. 2006. The influence matrix methodology: a technical report. Landcare Research Contract Report: LC0506/175.
- 32.Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. 2002. Network motifs: simple building blocks of complex networks. Science 298, 824-827. ( 10.1126/science.298.5594.824) [DOI] [PubMed] [Google Scholar]
- 33.Kaiser M, Hilgetag CC. 2006. Nonoptimal component placement, but short processing paths, due to long-distance projections in neural systems. PLoS Comput. Biol. 2, e95. ( 10.1371/journal.pcbi.0020095) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wood WB (ed.). 1988. The nematode Caenorhabditis elegans. Cold Spring Harbor Monograph Series, 17.
- 35.Frey BJ, Dueck D. 2007. Clustering by passing messages between data points. Science 315, 972-976. ( 10.1126/science.1136800) [DOI] [PubMed] [Google Scholar]
- 36.Adamic LA, Glance N. 2005. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd Int. Workshop on Link Discovery, pp. 36–43.
- 37.Benzell SG, Collis A, Nicolaides C. 2020. Rationing social contact during the COVID-19 pandemic: transmission risk and social benefits of US locations. Proc. Natl Acad. Sci. USA 117, 14 642-14 644. ( 10.1073/pnas.2008025117) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Newman MEJ. 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E. 74, 036104. ( 10.1103/PhysRevE.74.036104) [DOI] [PubMed] [Google Scholar]
- 39.Tudisco F, Higham DJ. 2019. A nonlinear spectral method for core-periphery detection in networks. SIAM J. Math. Data Sci. 1, 269-292. ( 10.1137/18M1183558) [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This research made use of public domain data that are available from the Internet, as indicated in the text. Code for the experiments is available at https://github.com/OpalGX/Directed-Network-Laplacians.








