Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Mar 2;117(11):5631–5637. doi: 10.1073/pnas.1911030117

The impossibility of low-rank representations for triangle-rich complex networks

C Seshadhri a,1, Aneesh Sharma b, Andrew Stolman a, Ashish Goel c
PMCID: PMC7084093  PMID: 32123073

Significance

Our main message is that the popular method of low-dimensional embeddings provably cannot capture important properties of real-world complex networks. A widely used algorithmic technique for modeling these networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to common wisdom, we argue that such graph embeddings do not capture salient properties of complex networks. We mathematically prove that low-dimensional embeddings cannot generate graphs with both low average degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. This establishes that popular low-dimensional embedding methods fail to capture significant structural aspects of real-world complex networks.

Keywords: graph embeddings, graph representations, low-dimensional embeddings, low-rank representations, singular value decomposition

Abstract

The study of complex networks is a significant development in modern science, and has enriched the social sciences, biology, physics, and computer science. Models and algorithms for such networks are pervasive in our society, and impact human behavior via social networks, search engines, and recommender systems, to name a few. A widely used algorithmic technique for modeling such complex networks is to construct a low-dimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to the common view, we argue that such graph embeddings do not capture salient properties of complex networks. The two properties we focus on are low degree and large clustering coefficients, which have been widely established to be empirically true for real-world networks. We mathematically prove that any embedding (that uses dot products to measure similarity) that can successfully create these two properties must have a rank that is nearly linear in the number of vertices. Among other implications, this establishes that popular embedding techniques such as singular value decomposition and node2vec fail to capture significant structural aspects of real-world complex networks. Furthermore, we empirically study a number of different embedding techniques based on dot product, and show that they all fail to capture the triangle structure.


Complex networks (or graphs) are a fundamental object of study in modern science, across domains as diverse as the social sciences, biology, physics, computer science, and engineering (13). Designing good models for these networks is a crucial area of research, and also affects society at large, given the role of online social networks in modern human interaction (46). Complex networks are massive, high-dimensional, discrete objects, and are challenging to work with in a modeling context. A common method of dealing with this challenge is to construct a low-dimensional Euclidean embedding that tries to capture the structure of the network (see ref. 7 for a recent survey). Formally, we think of the n vertices as vectors v1,v2,,vnRd, where d is typically constant (or very slowly growing in n). The likelihood of an edge (i,j) is proportional to (usually a nonnegative monotone function in) vivj (8, 9). This gives a graph distribution that the observed network is assumed to be generated from.

The most important method to get such embeddings is the singular value decomposition (SVD) or other matrix factorizations of the adjacency matrix (8). Recently, there has also been an explosion of interest in using methods from deep neural networks to learn such graph embeddings (912) (refer to ref. 7 for more references). Regardless of the specific method, a key goal in building an embedding is to keep the dimension d small—while trying to preserve the network structure—as the embeddings are used in a variety of downstream modeling tasks such as graph clustering, nearest-neighbor search, and link prediction (13). Yet a fundamental question remains unanswered: To what extent do such low-dimensional embeddings actually capture the structure of a complex network?

These models are often justified by treating the (few) dimensions as “interests” of individuals, and using similarity of interests (dot product) to form edges. Contrary to the dominant view, we argue that low-dimensional embeddings are not good representations of complex networks. We demonstrate mathematically and empirically that they lose local structure, one of the hallmarks of complex networks. This runs counter to the ubiquitous use of SVD in data analysis. The weaknesses of SVD have been empirically observed in recommendation tasks (1416), and our result provides a mathematical validation of these findings.

Let us define the setting formally. Consider a set of vectors v1,v2,,vnRd (denoted by the d×n matrix V) used to represent the n vertices in a network. Let GV denote the following distribution of graphs over the vertex set [n]. For each index pair i,j, independently insert (undirected) edge (i,j) with probability max(0,min(vivj,1)). (If vivj is negative, (i,j) is never inserted. If vivj1, (i,j) is always inserted.) We will refer to this model as the “embedding” of a graph G, and focus on this formulation in our theoretical results. This is a standard model in the literature, and subsumes the classic Stochastic Block Model (17) and Random Dot Product Model (18, 19). There are alternate models that use different functions of the dot product for the edge probability, which are discussed in Alternate Models. Matrix factorization is a popular method to obtain such a vector representation: The original adjacency matrix A is “factorized” as VTV, where the columns of V are v1,v2,,vn.

Two hallmarks of real-world graphs are 1) sparsity, where the average degree is typically constant with respect to n, and 2) triangle density, where there are many triangles incident to low-degree vertices (5, 2022). The large number of triangles is considered a local manifestation of community structure. Triangle counts have a rich history in the analysis and algorithmics of complex networks. Concretely, we measure these properties simultaneously as follows.

Definition 1. For parameters c>1 and Δ>0, a graph G with n vertices has a (c,Δ)-triangle foundation if there are at least Δn triangles contained among vertices of degree, at most, c. Formally, let Sc be the set of vertices of degree, at most, c. Then, the number of triangles in the graph induced by Sc is at least Δn.

Typically, we think of both c and Δ as constants. We emphasize that n is the total number of vertices in G, not the number of vertices in S (as defined above). Refer to real-world graphs in Table 1. In Fig. 1, we plot the value of c vs. Δ. (Specifically, the y axis is the number of triangles divided by n.) This is obtained by simply counting the number of triangles contained in the set of vertices of degree, at most, c. Observe that, for all graphs, for c[10,50], we get a value of Δ>1 (in many cases, Δ>10).

Table 1.

Datasets used

Dataset name Network type Number of nodes Number of edges
Facebook (29) Social network 4,000 88,000
cit-HePh (31, 32) Citation 34,000 420,000
String_hs (30) PPI 19,000 5.6 million
ca-HepPh (29) Coauthorship 12,000 120 million

All numbers are rounded to one decimal point of precision. PPI, protein–protein interaction.

Fig. 1.

Fig. 1.

Plots of degree c vs. Δ: For a High Energy Physics coauthorship network, we plot c versus the total number of triangles only involving vertices of degree, at most, c. We divide the latter by the total number of vertices n, so it corresponds to Δ, as in Definition 1. We plot these both for the original graph (in thick blue) and for a variety of embeddings (explained in Alternate Models). For each embedding, we plot the maximum Δ in a set of 100 samples from a 100-dimensional embedding. The embedding analyzed by our main theorem (TDP) is given in thick red. Observe how the embeddings generate graphs with very few triangles among low-degree vertices. The gap in Δ for low degree is two to three orders of magnitude. The other lines correspond to alternate embeddings, using the node2vec vectors and/or different functions of the dot product.

Our main result is that any embedding of graphs that generates graphs with (c,Δ)-triangle foundations, with constant c,Δ, must have near-linear rank. This contradicts the belief that low-dimensional embeddings capture the structure of real-world complex networks.

Theorem 1. Fix c>4,Δ>0. Suppose the expected number of triangles in GGV that only involve vertices of expected degree c is at least Δn. Then, the rank of V is at least min(1,poly(Δ/c))n/lg2n.

Equivalently, graphs generated from low-dimensional embeddings cannot contain many triangles only on low-degree vertices. We point out an important implication of this theorem for Stochastic Block Models. In this model, each vertex is modeled as a vector in [0,1]d, where the ith entry indicates the likelihood of being in the ith community. The probability of an edge is exactly the dot product. In community detection applications, d is thought of as a constant, or at least as much smaller than n. On the contrary, Theorem 1 implies that d must be Ω(n/lg2n) to accurately model the low-degree triangle behavior.

Empirical Validation

We empirically validate the theory on a collection of complex networks detailed in Table 1. For each real-world graph, we compute a 100-dimensional embedding through SVD (basically, the top 100 singular vectors of the adjacency matrix). We generate 100 samples of graphs from these embeddings, and compute their c vs. Δ plot. This is plotted with the true c vs. Δ plot. (To account for statistical variation, we plot the maximum value of Δ observed in the samples, over all graphs. The variation observed was negligible.) Fig. 1 shows such a plot for a physics coauthorship network. More results are given in SI Appendix.

Note that this plot is significantly off the mark at low degrees for the embedding. Around the lowest degree, the value of Δ (for the graphs generated by the embedding) is two to three order of magnitude smaller than the original value. This demonstrates that the local triangle structure is destroyed around low-degree vertices. Interestingly, the total number of triangles is preserved well, as shown toward the right side of each plot. Thus, a nuanced view of the triangle distribution, as given in Definition 1, is required to see the shortcomings of low dimensional embeddings.

Alternate Models

We note that several other functions of dot product have been proposed in the literature, such as the softmax function (10, 12) and linear models of the dot product (7). Theorem 1 does not have direct implications for such models, but our empirical validation holds for them as well. The embedding in Theorem 1 uses the truncated dot product (TDP) function max(0,min(vivj,1)) to model edge probabilities. We construct other embeddings that compute edge probabilities using machine learning models with the dot product and Hadamard product as features. This subsumes linear models as given in ref. 7. Indeed, the TDP can be smoothly approximated as a logistic function. We also consider (scaled) softmax functions, as in ref. 10, and standard machine learning models [Logistic Regression on the Dot Product (LRDP) and Logistic Regression on the Hadamard Product (LRHP)]. (Details about these models are given in Alternate Graph Models.)

For each of these models (softmax, LRDP, and LRHP), we perform the same experiment described above. Fig. 1 also shows the plots for these other models. Observe that none of them capture the low-degree triangle structure, and their Δ values are all two to three orders of magnitude lower than the original.

In addition (to the extent possible), we compute vector embeddings from a recent deep learning-based method [node2vec (12)]. We again use all of the edge probability models discussed above, and perform an identical experiment (in Fig. 1, these are denoted by “n2v”). Again, we observe that the low-degree triangle behavior is not captured by these deep learned embeddings.

Broader Context

The use of geometric embeddings for graph analysis has a rich history, arguably going back to spectral clustering (23). In recent years, the Stochastic Block Model has become quite popular in the statistics and algorithms community (17), and the Random Dot Product Graph model is a generalization of this notion [refer to recent surveys (19, 24)]. As mentioned earlier, Theorem 1 brings into question the standard uses of these methods to model social networks. The use of vectors to represent vertices is sometimes referred to as latent space models, where geometric proximity models the likelihood of an edge. Although dot products are widely used, we note that some classic latent space approaches use Euclidean distance (as opposed to dot product) to model edge probabilities (25), and this may avoid the lower bound of Theorem 1. Beyond graph analysis, the method of Latent Semantic Indexing also falls in the setting of Theorem 1, wherein we have a low-dimensional embedding of “objects” (like documents), and similarity is measured by dot product (https://en.wikipedia.org/wiki/Latent_semantic_analysis).

High-Level Description of the Proof

In this section, we sketch the proof of Theorem 1. The sketch provides sufficient detail for a reader who wants to understand the reasoning behind our result, but is not concerned with technical details. We will make the simplifying assumption that all vi have the same length L. We note that this setting is interesting in its own right, since it is often the case, in practice, that all vectors are nonnegative and normalized. In this case, we get a stronger rank lower bound that is linear in n. Dealing with Varying Lengths provides intuition on how we can remove this assumption. The full details of the proof are given in Proof of Theorem 1.

First, we lower-bound L. By Cauchy–Schwartz, vivjL2. Let Xi,j be the indicator random variable for the edge (i,j) being present. Observe that all Xi,j are independent, and E[Xi,j]=min(vivj,1)L2.

The expected number of triangles in GGV is

EijkXi,jXj,kXi,k [1]
ij,kE[Xj,k]E[Xi,j]E[Xi,k] [2]
L2ij,kE[Xi,j]E[Xi,k]=L2ijE[Xi,j]2. [3]

Note that jE[Xi,j]=E[jXi,j] is, at most, the degree of i, which is, at most, c. (Technically, the Xi,i term creates a self-loop, so the correct upper bound is c+1. For the sake of cleaner expressions, we omit the additive +1 in this sketch.)

The expected number of triangles is at least Δn. Plugging these bounds in,

ΔnL2c2nLΔ/c. [4]

Thus, the vectors have a length of at least Δ/c. Now, we lower-bound the rank of V. It will be convenient to deal with the Gram matrix M=VTV, which has the same rank as V. Observe that Mi,j=vivjL2. We will use the following lemma stated first by Swanapoel (26), but which has appeared in numerous forms previously

Lemma 1 (Rank lemma). Consider any square matrix MRn×n. Then

rank(M)iMi,i2ij|Mi,j|2.

Note that Mi,i=vivi=L2, so the numerator |iMi,i|2=n2L4. The denominator requires more work. We split it into two terms.

i,jvivj1(vivj)2i,jvivj1vivjcn. [5]

If, for ij, vivj>1, then (i,j) is an edge with probability 1. Thus, there can be, at most, (c1)n such pairs. Overall, there are, at most, cn pairs such that vivj>1. So, i,jvivj>1(vivj)cnL4. Overall, we lower-bound the denominator in the rank lemma by cn(L4+1).

We plug these bounds into the rank lemma. We use the fact that f(x)=x/(1+x) is decreasing for positive x, and that LΔ/c.

rank(M)n2L4cn(L4+1)ncΔ2/c4Δ2/c4+1=Δ2c(Δ2+c4)n.

Dealing with Varying Lengths.

The math behind Eq. 4 still holds with the right approximations. Intuitively, the existence of at least Δn triangles implies that a sufficiently large number of vectors have a length of at least Δ/c. On the other hand, these long vectors need to be “sufficiently far away” to ensure that the vertex degrees remain low. There are many such long vectors, and they can only be far away when their dimension/rank is sufficiently high.

The rank lemma is the main technical tool that formalizes this intuition. When vectors are of varying length, the primary obstacle is the presence of extremely long vectors that create triangles. The numerator in the rank lemma sums Mi,i, which is the length of the vectors. A small set of extremely long vectors could dominate the sum, increasing the numerator. In that case, we do not get a meaningful rank bound.

But, because the vectors inhabit low-dimensional space, the long vectors from different clusters interact with each other. We prove a “packing” lemma (Lemma 5) showing that there must be many large positive dot products among a set of extremely long vectors. Thus, many of the corresponding vertices have large degree, and triangles incident to these vertices do not contribute to low-degree triangles. Operationally, the main proof uses the packing lemma to show that there are few long vectors. These can be removed without affecting the low-degree structure. One can then perform a binning (or “rounding”) of the lengths of the remaining vectors, to implement the proof described in the above section.

Proof of Theorem 1

For convenience, we restate the setting. Consider a set of vectors v1,v2,,vnRd, that represent the vertices of a social network. We will also use the matrix VRd×n for these vectors, where each column is one of the vi. Abusing notation, we will use V to represent both the set of vectors and the matrix. We will refer to the vertices by the index in [n].

Let GV denote the following distribution of graphs over the vertex set [n]. For each index pair i,j, independently insert (undirected) edge (i,j) with probability max(0,min(vivj,1)).

The Basic Tools.

We now state some results that will be used in the final proof. Lemma 2 is an existing result. For all other statements, the proofs are provided in SI Appendix.

Lemma 2. [Rank lemma (26)] Consider any square matrix ARn×n. Then

iAi,i2rank(A)ij|Ai,j|2.

Lemma 3. Consider a set of s vectors w1,w2,,ws in Rd.

(i,j)[s]×[s]wiwj<0|wiwj|(i,j)[s]×[s]wiwj>0|wiwj|.

Recall that an independent set is a collection of vertices that induce no edge.

Lemma 4. Any graph with h vertices and maximum degree b has an independent set of at least h/(b+1).

Proposition 1. Consider the distribution GV. Let Di denote the degree of vertex i[n]. E[Di2]E[Di]+E[Di]2.

A key component of dealing with arbitrary-length vectors is the following dot product lemma. This is inspired by results of Alon (27) and Tao (28), who get a stronger lower bound of 1/d for absolute values of the dot products.

Lemma 5. Consider any set of 4d unit vectors u1,u2,,u4d in Rd. There exists some ij such that uiuj1/4d.

The Main Argument.

We prove by contradiction. We assume that the expected number of triangles contained in the set of vertices of expected degree, at most, c is at least Δn. We remind the reader that n is the total number of vertices. For convenience, we simply remove the vectors corresponding to vertices with expected degree of at least c. Let V^ be the matrix of the remaining vectors, and we focus on GV^. The expected number of triangles in GGV^ is at least Δn.

The overall proof can be thought of in three parts.

Part 1, remove extremely long vectors: Our final aim is to use the rank lemma (Lemma 2) to lower bound the rank of V. The first problem we encounter is that extremely long vectors can dominate the expressions in the rank lemma, and we do not get useful bounds. We show that the number of such long vectors is extremely small, and they can be removed without affecting too many triangles. In addition, we can also remove extremely small vectors, since they cannot participate in many triangles.

Part 2, find a “core” of sufficiently long vectors that contains enough triangles: The previous step gets a “cleaned” set of vectors. Now, we bucket these vectors by length. We show that there is a large bucket, with vectors that are sufficiently long, such that there are enough triangles contained in this bucket.

Part 3, apply the rank lemma to the “core”: We now focus on this core of vectors, where the rank lemma can be applied. At this stage, the mathematics shown in High-Level Description of the Proof can be carried out almost directly.

Now for the formal proof. For the sake of contradiction, we assume that d=rank(V^)<α(Δ4/c9)n/lg2n (for some sufficiently small constant α>0).

Part 1: Removing extremely long (and extremely short) vectors

We begin by showing that there cannot be many long vectors in V^.

Lemma 6. There are, at most, 5cd vectors of length at least 2n.

Proof. Let L be the set of “long” vectors, those with a length of at least 2n. Let us prove by contradiction, and so assume there are more than 5cd long vectors. Consider a graph H=(L,E), where vectors vi,vjL (ij) are connected by an edge if vivi2vjvj21/4n. We choose the 1/4n bound to ensure that all edges in H are edges in G.

Formally, for any edge (i,j) in H, vivjvi2vj2/4n(2n)2/4n=1. So (i,j) is an edge with probability 1 in GGV. The degree of any vertex in H is, at most, c. By Lemma 4, H contains an independent set I of a size of at least 5cd/(c+1)4d. Consider an arbitrary sequence of 4d (normalized) vectors in Iu1,,u4d. Applying Lemma 5 to this sequence, we deduce the existence of (i,j) in I (ij) such that vivi2vjvj21/4d1/4n. Then, the edge (i,j) should be present in H, contradicting the fact that I is an independent set.

Denote by V the set of all vectors in V^ with length in the range [n2,2n].

Proposition 2. The expected degree of every vertex in GGV is, at most, c, and the expected number of triangles in G is at least Δn/2.

Proof. Since removal of vectors can only decrease the degree, the expected degree of every vertex in GV is, naturally, at most, c. It remains to bound the expected number of triangles in GGV. By removing vectors in V\V, we potentially lose some triangles. Let us categorize them into those that involve at least one “long” vector (length 2n) and those that involve at least one “short” vector (length n2) but no long vector.

We start with the first type. By Lemma 6, there are, at most, 5cd long vectors. For any vertex, the expected number of triangles incident to that vertex is, at most, the expected square of the degree. By Proposition 1, the expected degree squares is, at most, c+c22c2. Thus, the expected total number of triangles of the first type is, at most, 5cd×2c2Δn/lg2n.

Consider any triple of vectors (u,v,w) where u is short and neither of the others are long. The probability that this triple forms a triangle is, at most,

min(uv,1)min(uw,1)min(u2v2,1)min(u2w2,1)(n22n)24n3.

Summing over all such triples, the expected number of such triangles is, at most, 4.

Thus, the expected number of triangles in GGV is at least ΔnΔn/lg2n4Δn/2.

Part 2: Finding core of sufficiently long vectors with enough triangles

For any integer r, let Vr be the set of vectors {vVv2[2r,2r+1)}. Observe that the Vr form a partition of V. Since all lengths in V are in the range [n2,2n], there are, at most, 3lgn nonempty Vr. Let R be the set of indices r such that |Vr|(Δ/60c2)(n/lgn). Furthermore, let V be rRVr.

Proposition 3. The expected number of triangles in GGV is at least Δn/8.

Proof. The total number of vectors in rRVr is, at most, 3lgn×(Δ/60c2)(n/lgn)(Δ/20c2)n. By Proposition 1 and linearity of expectation, the expected sum of squares of degrees of all vectors in rRVr is, at most, (d+c2)×(Δ/20c2)nΔn/10. Since the expected number of triangles in GGV is at least Δn/2 (Proposition 2) and the expected number of triangles incident to vectors in V\V is, at most, Δn/10, the expected number of triangles in GGV is at least Δn/2Δn/10Δn/8.

We now come to an important proposition. Because the expected number of triangles in GGV is large, we can prove that V must contain vectors of at least constant length.

Proposition 4. maxrR2rΔ/4c.

Proof. Suppose not. Then every vector in V has a length of, at most, Δ/4c. By Cauchy–Schwartz, for every pair u,vV, uvΔ/16c2. Let I denote the set of vector indices in V (this corresponds to the vertices in GGV). For any two vertices ijI, let Xi,j be the indicator random variable for edge (i,j) being present. The expected number of triangles incident to vertex i in GGV is

EjkIXi,jXi,kXj,k=jkIE[Xi,jXi,k]E[Xj,k].

Observe that E[Xj,k] is, at most, vjvkΔ/16c2. Furthermore, jkIE[Xi,jXi,k]=E[Di2] (recall that Di is the degree of vertex i). By Proposition 1, this is, at most, c+c22c2. The expected number of triangles in GGV is, at most, n×2c2×Δ/16c2=Δn/8. This contradicts Proposition 3.

Part 3: Applying the rank lemma to the core

We are ready to apply the rank bound of Lemma 2 to prove the final result. The following lemma contradicts our initial bound on the rank d, completing the proof. We will omit some details in the following proof, and provide a full proof in SI Appendix.

Lemma 7. rank(V)(αΔ4/c9)n/lg2n.

Proof. It is convenient to denote the index set of V be I. Let M be the Gram matrix (V)T(V); so, for i,jI, Mi,j=vivj. By Lemma 2, rank(V)=rank(M)(iIMi,i)2/i,jI|Mi,j|2. Note that Mi,i is vi22, which is at least 22r for viVr. Let us denote maxrR2r by L, so all vectors in V have a length of, at most, 2L. By Cauchy–Schwartz, all entries in M are, at most, 4L2.

We lower-bound the numerator.

iIvi222rR22r|Vr|2maxrR22r(Δ/60c2)(n/lgn)2=L4(Δ2/3600c4)(n2/lg2n).

A series of technical calculations are needed to upper-bound the denominator, i,jI|Mi,j|2. These details are provided in SI Appendix. The main upshot is that we can prove i,jI|Mi,j|2128cn(1+L4).

Crucially, by Proposition 4, LΔ/4c. Thus, 44c4L4/Δ21. Combining all of the bounds (and setting α<1/(128360044)),

rank(V)L4(Δ2/3600c4)(n2/lg2n)128cn(1+16L4)L4(Δ2/3600c4)(n/lg2n)128cn(44c4L4/Δ2+16L4)(αΔ4/c9)(n/lg2n).

Details of Empirical Results

Data Availability.

The datasets used are summarized in Table 1. We present here four publicly available datasets from different domains. The ca-HepPh is a coauthorship network, Facebook is a social network, and cit-HepPh is a citation network, all obtained from the SNAP graph database (29). The String_hs dataset is a protein–protein interaction network obtained from ref. 30. (The citations provide the link to obtain the corresponding datasets.)

We first describe the primary experiment, used to validate Theorem 1 on the SVD embedding. We generated a d-dimensional embedding for various values of d using the SVD. Let G be a graph with the n×n (symmetric) adjacency matrix A, with eigendecomposition ΨΛΨT. Let Λd be the matrix with the d×d diagonal matrix with the d largest magnitude eigenvalues of A along the diagonal. Let Ψd be the n×d matrix with the corresponding eigenvectors as columns. We compute the matrix Ad=ΨdΛdΨdT and refer to this as the d spectral embedding of G. This is the standard principal components analysis (PCA) approach.

From the spectral embeddings, we generate a graph from Ad by considering every pair of vertices (i,j) and generate a random value in [0,1]. If the (i,j)th entry of Ad is greater than the random value generated, the edge is added to the graph. Otherwise, the edge is not present. This is the same as taking Ad and setting all negative values to 0 and all values >1 to 1 and performing Bernoulli trials for each edge with the resulting probabilities. In all of the figures, this is referred to as the “SVD TDP” embedding.

Triangle Distributions.

To generate Figs. 1 and 2, we calculated the number of triangles incident to vertices of different degrees in both the original graphs and the graphs generated from the embeddings. Each of the plots shows the number of triangles in the graph on the vertical axis and the degrees of vertices on the horizontal axis. Each curve corresponds to some graph, and each point (x,y) in a given curve shows that the graph contains y triangles if we remove all vertices with a degree of at least x. We then generate 100 random samples from the 100-dimensional embedding, as given by SVD (described above). For each value of c, we plot the maximum value of Δ over all of the samples. This is to ensure that our results are not affected by statistical variation (which was quite minimal).

Fig. 2.

Fig. 2.

Plots of degree c vs. Δ: For each network, we plot c versus the total number of triangles only involving vertices of degree of, at most, c. We divide the latter by the number of vertices, so it corresponds to Δ, as in the main definition. In each plot, we plot these for both the original graph and the maximum Δ in a set of 100 samples from a 100-dimensional embedding. Observe how the embeddings generate graphs with very few triangles among low-degree vertices. The gap in Δ for low degree is two to three orders of magnitude in all instances.

Alternate Graph Models.

We consider three other functions of the dot product, to construct graph distributions from the vector embeddings. Details on parameter settings and the procedure used for the optimization are given in SI Appendix.

LRDP.

We consider the probability of an edge (i,j) to be the logistic function L(1+exp(k(vivjx0)))1, where L,k,x0 are parameters. Observe that the range of this function is [0,1], and hence can be interpreted as a probability. We tune these parameters to fit the expected number of edges, to the true number of edges. Then, we proceed as in the TDP experiment. We note that the TDP can be approximated by a logistic function, and thus the LRDP embedding is a “closer fit” to the graph than the TDP embedding.

LRHP.

This is inspired by linear models used on low-dimensional embeddings (7). Define the Hadamard product vivj to be the d-dimensional vector where the rth coordinate is the product of the rth coordinates. We now fit a logistic function over linear functions of (the coordinates of) vivj. This is a significantly richer model than the previous model, which uses a fixed linear function (sum). Again, we tune parameters to match the number of edges.

Softmax.

This is inspired by low-dimensional embeddings for random walk matrices (10, 12). The idea is to make the probability of edge (i,j) proportional to softmax, exp(vivj)/k[n]vivk. This tends to push edge formation even for slightly higher dot products, and one might imagine this helps triangle formation. We set the proportionality constant separately for each vertex to ensure that the expected degree is the true degree. The probability matrix is technically undirected, but we symmetrize the matrix.

node2vec experiments.

We also applied node2vec, a recent deep learning-based graph embedding method (12), to generate vector representations of the vertices. We use default parameters to run node2vec. (More details are provided in SI Appendix.) The node2vec algorithm tries to model the random walk matrix associated with a graph, not the raw adjacency matrix. The dot products between the output vectors vivj are used to model the random walk probability of going from i to j, rather than the presence of an edge. It does not make sense to apply the TDP function to these dot products, since this will generate (in expectation) only n edges (one for each vertex). We apply the LRDP or LRHP functions, which use the node2vec vectors as inputs to a machine learning model that predicts edges.

In Figs. 1 and 2, we show results for all of the datasets. We note that, for all datasets and all embeddings, the models fail to capture the low-degree triangle behavior.

Degree Distributions.

We observe that the low-dimensional embeddings obtained from SVD and TDP can capture the degree distribution accurately. In Fig. 3, we plot the degree distribution (in loglog scale) of the original graph with the expected degree distribution of the embedding. For each vertex i, we can compute its expected degree by the sum ipij, where pij is the probability of the edge (i,j). In all cases, the expected degree distribution is close to the true degree distributions, even for lower degree vertices. The embedding successfully captures the “first-order” connections (degrees), but not the higher-order connections (triangles). We believe that this reinforces the need to look at the triangle structure to discover the weaknesses of low-dimensional embeddings.

Fig. 3.

Fig. 3.

Plots of degree distributions: For each network, we plot the true degree distribution vs. the expected degree distribution of a 100-dimensional embedding. Observe how the embedding does capture the degree distribution quite accurately at all scales.

Supplementary Material

Supplementary File
pnas.1911030117.sapp.pdf (269.7KB, pdf)

Acknowledgments

C.S. acknowledges the support of NSF Awards CCF-1740850 and CCF-1813165, and ARO Award W911NF1910294.

Footnotes

The authors declare no competing interest.

This article is a PNAS Direct Submission. M.E.J.N. is a guest editor invited by the Editorial Board.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1911030117/-/DCSupplemental.

References

  • 1.Wasserman S., Faust K., Social Network Analysis: Methods and Applications (Cambridge University Press, 1994). [Google Scholar]
  • 2.Newman M. E. J., The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003). [Google Scholar]
  • 3.Easley D., Kleinberg J., Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Cambridge University Press, 2010). [Google Scholar]
  • 4.Barabasi A. L., Albert R., Emergence of scaling in random networks. Science 286, 509–512 (1999). [DOI] [PubMed] [Google Scholar]
  • 5.Watts D., Strogatz S., Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]
  • 6.Chakrabarti D., Faloutsos C., Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 2 (2006). [Google Scholar]
  • 7.Hamilton W. L., Ying R., Leskovec J., “Inductive representation learning on large graphs” in Neural Information Processing Systems, NIPS’17, Guyon I., von Luxburg U., Bengio S., Wallach H. M., Fergus R., Vishwanathan S. V. N., Garnett Roman, Eds. (Curran Associates Inc., 2017), pp. 1025–1035. [Google Scholar]
  • 8.Ahmed A., Shervashidze N., Narayanamurthy S., Josifovski V., Smola A. J., “Distributed large-scale natural graph factorization” in Conference on World Wide Web, Almeida V. A. F., Glaser H., Baeza-Yates R., Moon S. B., Eds. (ACM, 2013), pp. 37–48. [Google Scholar]
  • 9.Cao S., Lu W., Xu Q., “Deep neural networks for learning graph representations” in AAAI Conference on Artificial Intelligence, Schuurmans D., Wellman M. P., Eds. (Association for the Advancement of Artificial Intelligence, 2016), pp. 1145–1152. [Google Scholar]
  • 10.Perozzi B., Al-Rfou R., Skiena S., “Deepwalk: Online learning of social representations” in SIGGKDD Conference of Knowledge Discovery and Data Mining, S. A. M., Perlich C., Leskovec J., Wang W., Ghani R., Eds. (Association for Computing Machinery, 2014), pp. 701–710. [Google Scholar]
  • 11.Tang J., et al. , “Line: Large-scale information network embedding” in Conference on World Wide Web, Gangemi A., Leonardi S., Panconesi A., Eds. (ACM, 2015), pp. 1067–1077. [Google Scholar]
  • 12.Grover A., Leskovec J., “node2vec: Scalable feature learning for networks” in SIGGKDD Conference of Knowledge Discovery and Data Mining, Krishnapuram B., Shah M., Smola A. J., Aggarwal C. C., Shen D., Rastogi R., Eds. (Association for Computing Machinery, 2016), pp. 855–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.@twittereng, Embeddings@twitter. https://blog.twitter.com/engineering/en_us/topics/insights/2018/embeddingsattwitter.html.
  • 14.Bahmani B., Chowdhury A., Goel A., Fast incremental and personalized pagerank. Proc.VLDB Endowment 4, 173–184 (2010). [Google Scholar]
  • 15.Gupta P., et al. , “WTF: The Who to Follow service at Twitter” in Conference on World Wide Web, Gangemi A., Leonardi S., Panconesi A., Eds. (ACM, 2013), pp. 505–514. [Google Scholar]
  • 16.Kloumann I. M., Ugander J., Kleinberg J., Block models and personalized pagerank. Proc. Natl. Acad. Sci. U.S.A. 114, 33–38 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Holland P. W., Laskey K., Leinhardt S., Stochastic blockmodels: First steps. Soc. Networks 5, 109–137 (1983). [Google Scholar]
  • 18.Young S. J., Scheinerman E. R., “Random dot product graph models for social networks” in Algorithms and Models for the Web-Graph, Bonata A., Chung F. R. K., Eds. (Springer, 2007), pp. 138–149. [Google Scholar]
  • 19.Athreya A., et al. , Statistical inference on random dot product graphs: A survey. J. Mach. Learn. Res. 18, 1–92 (2018). [Google Scholar]
  • 20.Sala A., et al. , “Measurement-calibrated graph models for social network experiments” in Conference on World Wide Web, Rappa M., Jones P., Freire J., Chakrabarti S., Eds. (ACM, 2010), pp. 861–870. [Google Scholar]
  • 21.Seshadhri C., Kolda T. G., Pinar A., Community structure and scale-free collections of Erdös-Rényi graphs. Phys. Rev. E 85, 056109 (2012). [DOI] [PubMed] [Google Scholar]
  • 22.Durak N., Pinar A., Kolda T. G., Seshadhri C., “Degree relations of triangles in real-world networks and graph models” in Conference on Information and Knowledge Management, Chen X-w., Lebanon G., Wang H., Zaki M. J., Eds. (CIKM) (ACM, 2012), pp. 1712–1716. [Google Scholar]
  • 23.Fiedler M., Algebraic connectivity of graphs. Czech. Math. J. 23, 298–305 (1973). [Google Scholar]
  • 24.Abbe E., Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 18, 1–86 (2018). [Google Scholar]
  • 25.Hoff P. D., Raftery A. E., Handcock M. S., Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97, 1090–1098 (2002). [Google Scholar]
  • 26.Swanapoel K., The rank lemma. https://konradswanepoel.wordpress.com/2014/03/04/the-rank-lemma/.
  • 27.Alon N., Problems and results in extremal combinatorics, part I, discrete math. Discrete Math. 273, 31–53 (2003). [Google Scholar]
  • 28.Tao T., A cheap version of the Kabatjanskii-Levenstein bound for almost orthogonal vectors. https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of-the-kabatjanskii-levenstein-bound-for-almost-orthogonal-vectors/.
  • 29.Leskovec J., Stanford Network Analysis Project. http://snap.stanford.edu/. Accessed 1 November 2019.
  • 30.STRING Consortium, String database. http://version10.string-db.org/. Accessed 1 November 2019.
  • 31.Tang J., Zhang J., Yao L., Li J., Zhang L., Su Z., Citation network dataset. https://aminer.org/citation. Accessed 1 November 2019.
  • 32.Yao L., Li J., Zhang L., Su Z., “ArnetMiner: Extraction and mining of academic social networks” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Li Y., Liu B., Sarawagi S., Eds. (ACM, 2008), pp. 990–998. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1911030117.sapp.pdf (269.7KB, pdf)

Data Availability Statement

The datasets used are summarized in Table 1. We present here four publicly available datasets from different domains. The ca-HepPh is a coauthorship network, Facebook is a social network, and cit-HepPh is a citation network, all obtained from the SNAP graph database (29). The String_hs dataset is a protein–protein interaction network obtained from ref. 30. (The citations provide the link to obtain the corresponding datasets.)

We first describe the primary experiment, used to validate Theorem 1 on the SVD embedding. We generated a d-dimensional embedding for various values of d using the SVD. Let G be a graph with the n×n (symmetric) adjacency matrix A, with eigendecomposition ΨΛΨT. Let Λd be the matrix with the d×d diagonal matrix with the d largest magnitude eigenvalues of A along the diagonal. Let Ψd be the n×d matrix with the corresponding eigenvectors as columns. We compute the matrix Ad=ΨdΛdΨdT and refer to this as the d spectral embedding of G. This is the standard principal components analysis (PCA) approach.

From the spectral embeddings, we generate a graph from Ad by considering every pair of vertices (i,j) and generate a random value in [0,1]. If the (i,j)th entry of Ad is greater than the random value generated, the edge is added to the graph. Otherwise, the edge is not present. This is the same as taking Ad and setting all negative values to 0 and all values >1 to 1 and performing Bernoulli trials for each edge with the resulting probabilities. In all of the figures, this is referred to as the “SVD TDP” embedding.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES