Matchability of heterogeneous networks pairs[image]

Vince Lyzinski; Daniel L Sussman

doi:10.1093/imaiai/iaz031

. 2020 Jan 6;9(4):749–783. doi: 10.1093/imaiai/iaz031

Matchability of heterogeneous networks pairs

Vince Lyzinski ^1,^✉, Daniel L Sussman ²

PMCID: PMC7737166 PMID: 33343893

Abstract

We consider the problem of graph matchability in non-identically distributed networks. In a general class of edge-independent networks, we demonstrate that graph matchability can be lost with high probability when matching the networks directly. We further demonstrate that under mild model assumptions, matchability is almost perfectly recovered by centering the networks using universal singular value thresholding before matching. These theoretical results are then demonstrated in both real and synthetic simulation settings. We also recover analogous core-matchability results in a very general core-junk network model, wherein some vertices do not correspond between the graph pair.

Keywords: graph matching, random graphs, singular value thresholding

1. Introduction and background

The graph matching problem (GMP) seeks to find an alignment between the vertex sets of two graphs that best preserves common structure across graphs. At its simplest, it can be formulated as follows: given two adjacency matrices Inline graphic and corresponding to -vertex graphs, the GMP seeks to minimize over permutation matrices ; i.e., the GMP seeks a relabelling of the vertices of that minimizes the number of induced edge disagreements between and . Variants and extensions of this problem have been extensively studied in the literature, with applications across areas as diverse as biology and neuroscience [10,18,30,58], computer vision [20,35,45], pattern recognition [33,48,62] and social network analysis [25,34,59], among others. For a survey of many of the recent applications and approaches to the GMP, see the sequence of survey papers [12,19,23]. While recent results [3] have whittled away at the complexity of the related graph isomorphism problem—determining whether a permutation matrix Inline graphic exists satisfying —at its most general, where and are allowed to be weighted and directed, the GMP is known to be NP-hard. Indeed, in this case, the GMP is equivalent to the notoriously difficult quadratic assignment problem [7,8,36]. However, recent approaches that leverage efficient representation/learning methodologies (see, for example, [5,25,59]) have shown excellent empirical performance matching networks with up to millions of nodes.

In addition to algorithmic advancements in graph matching, there has been a flurry of activity studying the closely related problem of graph matchability; given a latent alignment between the vertex sets of two graphs, can graph matching uncover this alignment in the presence of shuffled vertex labels? This problem arises in a variety of contexts, from network de-anonymization and privatization to multi-network hypothesis testing [37] to multimodality graph embedding methodologies [10]. Many existing results are concerned with recovering a latent alignment present across random graph models, where each of Inline graphic and have identical marginal distributions, and exciting advancements on the threshold of matchable vs. unmatchable graphs have been made across many random graph settings, including the homogeneous correlated Erd̋s-Renyi model (see, for example, [4,17,39,44]), the correlated stochastic blockmodel setting (see, for example, [37,43]), the Inline graphic -correlated heterogeneous Erd̋s-Renyi model (see, for example, [38,40]), and in the correlated heterogeneous Erd̋s–Rényi model with varying edge correlations (see, for example, [40,49]). In the non-identically distributed model setting, the work in [13–15] provide theoretic phase transitions on matchability in the Inline graphic Erd̋s–Rényi (p,q,) model (i.e., Erd̋s–Rényi (n,p), Erd̋s–Rényi (n,q) and the edge correlation across graphs in provided by the constant ; see Definition 1.2).

The above results range from providing theoretic phase transitions on matchability [13,14,37] to providing nearly efficient methods for achieving matchability from an algorithmic perspective [4,15,17,21]. While they have served to establish a novel theoretical understanding of the matchability problem, in each case the transition from matchable to unmatchable graphs is defined in terms of decreasing across-graph correlation and within-graph sparsity. Importantly, it is not a function of fundamentally different probabilistic structures across the graphs to be matched. As we often witness in applications, the graph topologies can differ significantly even among vertices that correspond to the same entity across networks. Social networks offer a compelling example of this, where matching across different social network platforms requires the understanding that not all users will be behaving homogeneously across different network platforms [34]. Both theoretically (see Example 2.2) and practically (see, for example, [10]), this distributional heterogeneity can have a deleterious effect on graph matchability.

Herein, we propose one possible solution for ameliorating the effect of Inline graphic and not being identically distributed, namely via a universal singular value thresholding (USVT) [9] centering preprocessing step. Working in a general correlated edge-independent random graph model (see Definition 1.1), we theoretically demonstrate that USVT centering asymptotically almost surely recovers the matchability for all but a vanishing fraction of the nodes (see Theorem 2.5). In addition, we recover analogous result (see Theorem 2.9) in the setting in which only a fraction of the vertices possess a true correspondence across networks, generalizing and extending the results of [29,44,56].

This centering step is practically implementable on even very large networks and is demonstrated to have a significant positive impact on graph matchability in both real and synthetic data settings (see Section 3). While the results contained herein do not guarantee that any computationally efficient algorithm will be able to perfectly (or almost perfectly) align any given networks after USVT centering, they provide a theoretical foundation for subsequently studying algorithmic effectiveness. Indeed, they ensure that with high probability the optimal alignment according to the graph matching objective function is, essentially, the true latent vertex alignment, guaranteeing that subsequent optimization procedures are, at the least, seeking the right permutation.

1.1 Notation

The following notation will be used throughout the manuscript: for Inline graphic , we will let denote the hollow matrix with all ’s on its off-diagonal, will denote the matrix of all ’s and will denote the set . We will consider and interchangeably as adjacency matrices and as the corresponding graphs consisting of vertices and edges. For a set , we denote by Inline graphic the induced subgraph of on the vertices of .

For a matrix Inline graphic , the Froebenius norm of the matrix is defined as

and the operator norm of Inline graphic is denoted

where Inline graphic is the largest singular value of . We denote the of via

Below we will make use of the following trace form of the Frobenius norm: Inline graphic ; see [28] for more on the Frobenius norm and its many uses. For matrices and , we define via

where Inline graphic is the matrix of all ’s.

We will also make extensive use of modern asymptotic notation. To review, if Inline graphic and are non-negative functions of , then we write

1.2 Correlated heterogeneous Erd̋s-Rényi graphs

Formally the GMP we will consider is defined as follows.

Definition 1.1

Let be the adjacency matrices of weighted, undirected graphs on vertices. The GMP is to find an element of

(1.1)

where is the set of permutation matrices.

Equation (1.1) follows here from

We note here that, traditionally, the GMP formulated in Definition 1.1 is defined for unweighted graphs Inline graphic and . The extension we consider to weighted graphs is commonly used in the literature (see, for example, the work in [53]) and is useful for studying situations in which edges/vertices in the network have weight features attached to them. This added flexibility will be needed for subsequent theoretical developments and data applications.

In the presence of a latent vertex alignment, Inline graphic , between the vertices of and , we wish to understand the extent to which graph matching and will recover ; i.e., if is the permutation matrix corresponding to , will ? In order to study this problem from a probabilistic perspective, we introduce a bivariate random graph model with a natural vertex alignment across graphs: the bivariate, correlated, heterogeneous, Erd̋s–Rényi random graph.

Definition 1.2

For symmetric, matrices, we say are instantiations of the -correlated heterogeneous Erd̋s–Rényi random graph model with parameters (abbreviated as ) if

ER; i.e., is an independent edge random graph with no self-loops satisfying
for each

ER; i.e., is an independent edge random graph with no self-loops satisfying
for each ;

Edges across networks are collectively independent except that for each , the correlation between and is

Before proceeding further, we will make a few remarks on the Inline graphic random graph model. In the homogeneous ER model, network growth as is natural, and we can consider an asymptotic regime in which depends on . Here, we similarly consider and to be dependent on , but make no further assumptions on expressly how the dependence on is manifest. This allows for us to consider classical homogeneous Erd̋s–Rényi, stochastic blockmodels [27], random dot product graphs (conditioned on the latent positions) [57], etc., as subfamilies of our random graph model.

In addition, by allowing Inline graphic and to differ, this model allows for a latent correspondence to exist in settings where the underlying topology and degree structure of the graphs to be matched differs significantly. This distributional heterogeneity is often observed in real data settings (see, for example, the connectomes being aligned in [10] and the social networks aligned in [34]), and we seek to understand the limitations of graph matching approaches when attempting to overcome this heterogeneity. Note also that when Inline graphic , there are restrictions on feasible correlations : Indeed, if Bernoulli and Bernoulli are -correlated with , then the correlation must satisfy .

Lastly, this model naturally allows us to consider a partition of Inline graphic into core () and junk () vertices ; core vertices are those that have a corresponding vertex (i.e., true match) across networks, while junk vertices do not. If we consider with of the form where , then it is reasonable to define and . For all it would then hold that for all , and Inline graphic and are independent random variables. A natural question to ask is when an optimal GM algorithm will correctly align the vertices in across networks. This problem was studied in the context of homogeneous ER networks with constant correlation in [29], and the results in Section 2.4 generalize and extend those in [29] to this more adaptable network model.

Remark 1.1

In what follows, and are not necessarily assumed to be hollow matrices, and we allow for non-zero entries on the diagonals of the . This is done to simplify eigen-decompositions in our proof methods. We do assume our graphs and are loop-free and have no self edges. As such, (resp., ) are necessarily hollow and do not necessarily equal (resp., ). We do have that for , (resp., ), but this will not hold when if .

2. Graph matchability

In the Inline graphic setting, we seek to understand when a graph matching procedure could correctly align the vertices across networks; i.e., if where denotes the identity matrix. More generally, if is a dissimilarity (i.e., if it is a symmetric, non-negative function with for all ; see, for example, [51]), when is it the case that

In this more general framework, we consider the following definition of graph matchability.

Definition 2.1

Let be a dissimilarity. We will say that are -matchable if

where denotes the identity matrix.

By considering an appropriate Inline graphic in Definition 2.1, we can fit the classical GMP in the formulation; indeed, the GMP of Definition 1.1 considers the dissimilarity defined via

In this paper, we will consider, more generally, dissimilarity functions of the form

for suitably defined matrix-valued function

Special cases of interest in our present Inline graphic setting are

(2.1)

(2.2)

where Inline graphic (resp., ) is a suitable estimate of (resp., ) derived from (resp., ). In addition to the notion of -matchability for a dissimilarity , we will also define the notion of oracle-matchability. We will say that and distributed are -matchable if

In the sequel, oracle-matchability will provide a useful theoretic bridge between Inline graphic -matchability and -matchability. Note that we will write -matchability and -matchability for the notions defined in Eqs. (2.1) and (2.2), respectively.

A natural question to ask is why we define the GMP in terms of Inline graphic and not in terms of a more general dissimilarity ; indeed, alternate dissimilarities have been considered in the definition of the GMP in the graph matching literature (see, for example, [60,61]). Moreover, we consider the GMP objective function formulation in Definition 1.1 even though in numerous settings the optimal solution to this GMP may not be a given latent vertex alignment, and in this section, we will see instances of when Inline graphic are, with high probability, not -matchable. Our choice of in the GMP is motivated by two main factors. First, this is the classical definition of graph matching and ties our current work to a vast graph matching literature. Secondly, we seek to understand conditions for when the original Inline graphic -matchability fails, yet there is a suitable dissimilarity for which -matchability is achieved. As the formulation in Definition 1.1 is commonly used in practice, this could provide practical guidance for when vertex labels can be recovered via a different objective function viewpoint.

In recent work addressing the question of Inline graphic -matchability, results have been established for the , setting (see, for example, [4,17,39,44]), in the correlated stochastic blockmodel setting (see, for example, [37,43]), in the correlated heterogeneous Erd̋s–Rényi model (see, for example [38,40]), and in the general and general Inline graphic setting (see, for example, [40,49]). In the non-identically distributed model setting, the work in [13–15] considers , and . In each setting, the results showed that for sufficiently dense, sufficiently correlated graphs, -matchability is almost surely achieved. Converse results in [13,14,37,39] show that in the sufficiently sparse and/or weakly correlated setting, Inline graphic -matchability is a.s. lost (i.e., a.s. the solution to the GMP is not the latent alignment). The work in [13,14] deserves special mention, as the converse results therein are proven for general -matchability; i.e., for sufficiently sparse and/or weakly correlated networks, -matchability is a.s. lost for all dissimilarities Inline graphic .

In these examples, it is sparsity and/or weak dependence that is potentially thwarting the matching in each instance and not the heterogeneity of the model itself. As the next straightforward but illustrative example demonstrates, the degree and structural heterogeneity across networks allowed for in the Inline graphic model makes the question of -matchability a bit more nuanced.

Example 2.2

Consider the following correlated heterogeneous stochastic blockmodel example. Let be distinct, and define

Let be the vertices in block 1 in and , and let be the vertices in block 2. Assuming and, letting and we consider of the form

Unlike in the cases where the loss of -matchability is due to network sparsity and/or weak correlation, in this example the non-identically distributed nature of and can obfuscate the true alignment from a graph matching perspective. Indeed, for many choices of the parameters above, the optimal permutation for the GMP in Definition 1.1 will not be the latent correspondence, and permuting blocks 1 and 2 will, with high probability, yield a better GMP objective function.

To wit, let be any permutation such that (so that aligns block 1 in to block in and vice versa) with corresponding permutation . The number of edges such that is bounded above by and

Therefore,

Combined, we see that the difference in the objective function for as compared to is

Numerous choices of the parameters in this model (for example, , , , , ) yields that for a positive constant (in the example, ). As is highly concentrated about its expectation (see Appendix A.1), there is high probability that , and

i.e., and would not be -matchable.

2.1 Centering to recover matching

In the previous example, we see that Inline graphic can effectively make and not -matchable. One way to recover the latent alignment in this heterogeneous setting is to transform the problem back into the homogeneous case, and rather than matching and , we would match and ; yielding once again . As the next theorem demonstrates, this is sufficient to a.s. recover Inline graphic -matchability under mild model conditions. Before stating the theorem, we first must define some additional notation. For , and permutation , define the matrix via

(2.3)

For each Inline graphic in , define

(2.4)

Theorem 2.3

Let and consider and . For each , define

If for all and all , we have

then

The proof of Theorem 2.3 relies on a now standard application of McDiarmid’s inequality, and is similar to the proofs of analogous matchability results in [37–39]; details of the proof can be found in Appendix A.1.

Remark 2.1

The growth condition on in Theorem 2.3,namely , is attempting to capture the necessary degree to which the entry-wise covariance matrix needs to be asymmetric. If we define , then from Eqs. (A.4) and (A.3) we that for ,

and if then . Constraining globally and not entry-wise allows for more flexibility in applying the theorem to settings where some of the edges are very sparse or weakly correlated.

Consider the growth condition on Inline graphic , namely , in the , homogeneous ER setting (wlog, assume ). In this setting, as , and are -matchable iff and are -matchable. In the sparse setting of [13,Theorem 1], -matchability is achieved with high probability when all of the following hold:

(2.5)

(2.6)

(2.7)

(2.8)

Note that these conditions cannot simultaneously hold when

Indeed if Inline graphic , Eq. (2.6) implies , and hence . Therefore,

contradicting Eq. (2.7). In the Inline graphic setting, modulo the sparsity conditions, there is a -matchability phase transition at , and the corresponding rate achieved in Theorem 2.3, namely , is above this phase transition threshold. We view this as the price paid in the theorem for being able to handle both heterogeneous and homogeneous ER settings.

In this setting, the growth condition of Theorem 2.3, Inline graphic , can hold in the dense setting as well as when In the dense setting of , -matchability transitions at [37,39], which our theorem recovers (asymptotically).

In [13], the authors establish a phase transition at Inline graphic , providing the corresponding converse result that ensures no -matchability if for all dissimilarities . We do not derive a corresponding converse result to Theorem 2.3 herein (namely, a condition on that ensures and are not -matchable for any suitable ), as we are focused on how to practically recover Inline graphic -matchability in the non-identically distributed setting; see Theorem 2.5.

2.2 Approximate centering to almost recover matching

Unfortunately, centering Inline graphic and by the true edge probability matrices and is impractical, as these model parameters are unknown in practice. Our solution to this hurdle is to estimate the unknown and via USVT [9], and then approximately center the networks via these estimates.

Our method for estimating Inline graphic and is based on the USVT method of [9], and USVT applied in the present setting is outlined in Algorithm 1.

Algorithm 1 USVT for estimating —

Input: Adjacency matrix , threshold ;

1. Let be the singular value decomposition of , with singular values ordered via ;

2. Let be the set of singular values greater than the threshold ;

3. Define

Output: defined via for all , and for

If we estimate Inline graphic via and via using USVT, can we recover -matchability using the approximately centered matrices, and , for a suitable ? Given the error introduced in estimating the edge probability matrices, the answer is unsurprisingly `no', at least for the proof techniques we employ herein. However, if we slightly weaken Definition 2.1 to allow for a vanishing fraction of unmatched nodes, then we can recover an analogous result to Theorem 2.3. This motivates the following definition:

Definition 2.4

Let be a dissimilarity. Consider random graphs . We say that and are -matchable if

Unwrapping Definition 2.4, we see that Inline graphic -matchability is equivalent to any optimal permutation (under dissimilarity ) correctly recovering the labels of at least vertices across and . As the following theorem indicates, under mild model assumptions, and are -matchable asymptotically almost surely, where the estimates and Inline graphic in Eq. (2.2) are the USVT estimates. The proof of Theorem 2.5 can be found in Appendix A.2.

Theorem 2.5

Let and further assume that for each ,

(i) There exists such that entry-wise. Note that for each , is fixed, though we allow to vary in .

(ii) We have that .

(iii) is approximately low rank in that there exists a such that where are the singular values of .

If for all and all , we have that there exists such that

and for each , there exists constants such that if is the USVT estimate of with threshold level and if

then and satisfy

Let us take a moment to explore the assumptions in Theorem 2.5. Assumptions (i) and (ii) control the allowable sparsity of the networks, ensuring that the minimum expected degree grows asymptotically faster than Inline graphic . If the mean expected degree was , then the graphs would be a.s. disconnected [6], and our proof techniques fail as and would no longer concentrate about and with high probability [32]. The rank assumption in (iii) is needed to control the accuracy of the USVT estimates of the unknown Inline graphic ’s. Practically, smaller allow us to use suitable low-rank estimates of that are computationally easier to implement; this is indeed the case in many common random graph models such as the Stochastic blockmodel [27] (where often ), random dot product graphs [57] and latent position random graphs [26] (where Inline graphic is often taken to be [52]), among others.

If Inline graphic is bounded away from entry-wise, and each entry of is (which is indeed the case in the oft adopted setting, where each for a matrix with entries of order ), then as defined in Remark 2.1 satisfies . We then have From (Eq. A.10) in the proof of Theorem 2.5, we see that -matchability is achieved here for

If, in addition Inline graphic and each , then up to a logarithmic factor and are -matchable, and an oracle graph matching algorithm would properly align all but potentially a vanishing fraction of the nodes across the graphs.

Remark 2.2

In Theorem 2.5,we estimate via using USVT with threshold . In application, often suitable estimates of can be obtained with rank of order or [52], especially in the setting of latent space graph models. For the purposes of our proof approach, suitably good means that (similarly for ). We do not explore this model selection question further here (i.e., estimating a suitable rank rather than a threshold for our USVT estimates), as in applications often only a relatively small number of singular values are above the USVT threshold.

2.3 When to center?

We have seen above that in the setting where both Inline graphic and are not -matchable, centering and via and can recover -matchability by ameliorating the effect of the differing ’s. Moreover, approximately centering by and theoretically recovers -matchability for all but a vanishing fraction of the vertices. A natural question is in the case when Inline graphic , does Theorem 2.5 imply that a.s. perfect -matchability is potentially lost when USVT centering is performed unnecessarily?

Consider the following simple example, where Inline graphic and , , and we vary and . In this example, there is no need to center before matching, and the variability introduced by estimating the ’s could potentially cause

Fortunately, at least in this example we see this is not the case (see Fig. 1). As matching these graphs exactly (i.e., finding the argmin of the GMP) is computationally challenging, we use as a surrogate for Inline graphic -matchability and -matchability whether the true alignment is a local (rather than global) minima of the GMP before and after centering. To test this, we match the graph pairs (USVT centered and uncentered) using the constrained gradient-ascent based graph matching algorithm, FAQ [54], initialized at the true correspondence Inline graphic . While FAQ is not guaranteed to terminate at a local minima, if it terminates at , then that is evidence in support of ’s local optimality. Moreover, if FAQ does not terminate at , then is not a local minima of the GMP.

Letting FAQ Inline graphic (resp., FAQ) denote the number of vertices correctly matched by FAQ initiated at when and are matched directly (resp., when and are USVT centered before they are matched). In Fig. 1, we plot the mean (1 s.d.) of FAQFAQ over and . From the figure, we see that there is no significant performance lost by centering the graphs first. Indeed, in highly structured/low-rank settings (e.g., homogeneous ER or SBM), we can obtain high-fidelity estimates of the individual entries of the Inline graphic vs. the global estimates used in current proof. These local estimates (which can also be obtained by non-spectral methods, e.g., in the case we can use ) should allow for significantly less error to be introduced in the estimation of the ’s and will allow for little-to-no theoretic degradation due to centering. As we are more focused on the general Inline graphic case, we do not pursue this further here.

2.4 Core matchability

Often in applications, only a fraction of the vertices in Inline graphic possess a latent matched pair in . We will denote those vertices that have a latent match across graphs as the core vertices (denoted ), and we will denote those vertices that do not have a latent match across graphs as the junk vertices (denoted ). In this section, we seek to further understand the ability of an oracle graph matching procedure to correctly match the cores across graphs. This motivates the following definition of core-matchability.

Definition 2.6

Let , and consider a partition of the vertex sets into core and junk vertices,

Define to be the set of core matching permutations. For dissimilarity , we say that and are core -matchable if

i.e., if any optimal permutation aligning and under perfectly matches the cores across networks.

If we consider Inline graphic with of the form where , then it is reasonable to define and . Indeed, under these model assumptions if either or is in .

Completely analogously to the setting considered in Example 2.2, it is immediate that Inline graphic and need not be asymptotically almost surely core -matchable even with non-vanishing core correlation in . Indeed, as in Example 2.2, and can be chosen to effectively obfuscate the true alignment among the core vertices. Mimicking the results of Theorem 2.3, centering and again a.s. recovers core Inline graphic -matchability of and under mild model assumptions. The proof of Theorem 2.7 is contained in Appendix A.3.

Theorem 2.7

Let and consider and . Suppose that is of the form where , and for each , let be defined as in Eq. (2.4). If for all we have that

(2.9)

(2.10)

and also if , then

As before, if we define Inline graphic then for , . If, in addition,

then Eqs. (2.9) and (2.10) hold and core Inline graphic -matchability is recovered. In the event that is or then Theorem 2.7 implies that and are core -matchable in the presence of nearly linear junk, with arbitrary junk structure. This result extends and generalizes the results in [29] to the non-homogeneous ER setting.

As before, if the unknown Inline graphic and are estimated via USVT, then we recover partial core matchability. Before formalizing this, we first need the following extension of Definition 2.4 to the core-junk setting.

Definition 2.8

Let be a dissimilarity. Let , and consider a partition of the vertex sets into core and junk vertices,

Define

We say that and are core -matchable if

The following theorem provides the analogue of Theorem 2.5 in the core-junk setting. Note that the proof is completely analogous to that in Theorem 2.5, and so is omitted.

Theorem 2.9

Let and consider and . Suppose that is of the form where . With the assumptions on and from Theorem 2.5,and assume for each ,

(2.11)

(2.12)

and also assume . For each , there exists constants such that if is the USVT estimate of with threshold level and if

then

3. Simulations and experiments

In the following sections, we explore the impact on graph matchability of USVT centering in both simulated and real data settings. We note here that precisely determining the level of Inline graphic , and -matchability is infeasible for even modestly sized networks, as this would require exactly solving the NP-hard GMP. To circumvent this, we instead match our networks using the FAQ algorithm of [54] initialized at a variety of starting points including the true correspondence. As it is a Frank–Wolfe [24] based algorithm, if FAQ terminates at the true correspondence (or at a permutation which matches a high percentage of the vertices), then the true correspondence is an estimated local minima of the GMP. Moreover, if FAQ is initialized at the true correspondence and does not terminate at the true correspondence, then the true correspondence is not a local minima. Comparing objective function values across estimated local minima then allows us to approximately gauge the global optimality of the true correspondence. While this is not the same as directly finding the global minima desired in the definition of matchability, it nonetheless provides a useful, principled heuristic for empirically studying both matchability and deviations there from.

3.1 Simulation

To explore the utility of USVT centering as a graph matching preprocessing step, we consider the following experiment. We let Inline graphic with

and Inline graphic of the form

and we use the FAQ algorithm of [54] to match (i) Inline graphic and directly (labelled ‘Uncentered’ in Figs 2– 3); (ii) and (labelled ‘Centered’ in Figs 2– 3); and and (labelled ‘Approx. Centered’ in Figs 2– 3). In each figure, we initialize the FAQ algorithm at —i.e., at the true latent alignment—and at —i.e., at the alignment completely confusing blocks one and two across networks. We plot the mean fraction of vertices matched correctly ( Inline graphic s.d.) by FAQ at the starting point that achieves the lowest graph matching objective function score (averaged over Monte Carlo replicates). As mentioned above, if the fraction matched correctly is less than , then the true alignment is not a local minimum of the graph matching objective function and the graphs are not Inline graphic , , or -matchable (depending on what input FAQ is matching).

Fig. 2. — Fraction correctly matched by `FAQ`s.d. (optimized over two different initializations: and at ) vs. when matching (i) and directly (labelled ‘Uncentered’); (ii) and (labelled ‘Centered’); and and (labelled ‘Approx. Centered’). Here, captures the level of correlation between and (higher means more correlation), is fixed and results are averaged over Monte Carlo replicates.

Fig. 3. — Fraction correctly matched by `FAQ`s.d. (optimized over two different initializations— and at ) vs. when matching (i) and directly (labelled ‘Uncentered’); (ii) and (labelled ‘Centered’); and and (labelled ‘Approx. Centered’). Here, is fixed and results are averaged over Monte Carlo replicates.

In Fig. 2, we consider Inline graphic —i.e., the graphs are size —and in the USVT estimates we used as suggested in [9]. We plot vs. the the mean fraction of vertices matched correctly (s.d.) It is unsurprising in light of Example 2.2, that and are not only directly not -matchable, but that the alignment found by FAQ matches none of the vertices correctly across graphs. Also of note in the figure is that as Inline graphic increases, the oracle centered graphs and appear to be nearly -matchable (in that the estimated local minimum found by FAQ is close to ). The steep performance drop off as decreases is a consequence of the fact that in low-correlation regimes (low for a given ), -matchability is often not recovered even through centering. Performance in the approximately centered case tracks performance in the centered case, with the USVT centering recovering the gains of the oracle centering. This empirically suggests that the Inline graphic lower bound in Theorem 2.5 is not sharp, as USVT centering recovers full -matchability in the high regime. We surmise that if is truly low-rank, USVT centering and true centering will recover perfect - and -matchability respectively as increases.

In Fig. 3 we repeat the above simulation with Inline graphic fixed and . Using as the USVT threshold, we again plot the mean fraction of vertices matched correctly ( s.d.) vs. . As before, without centering the optimal alignment found by FAQ matches very few of the vertices correctly across graphs. The performance increase in the oracle centered setting as Inline graphic increases is a consequence of the -matchability (in Theorem 2.3) being an asymptotic result; indeed, we should not expect correlated small networks to be almost surely -matchable even with the oracle centering. We note, however, that (i.e., graph order ) here is sufficient for the asymptotically perfect Inline graphic -matchability to be recovered. Again we see that performance in the approximately centered case tracks performance in the centered case, with the USVT centering achieving almost all of the gains of the oracle centering.

3.2 Twitter data

In order to analyse the impact of USVT centering with real data, we considered two graphs derived from Twitter.¹ The two graphs are based on the most active twitter users from April and May 2014. The graphs are unweighted with an edge between users if a user mentioned another user during the given month. After keeping only the largest common connected component, the number of users was 431 in each graph.

As can be seen in Fig. 4, many vertices in this data set have very similar connectivity patterns. Indeed, the empirical Pearson correlation between the entries in the two adjacency matrices is Inline graphic . It is not surprising that the similar graph topology across networks leads to good performance when matching the graphs without centering. Repeating the experiment in Section 3.1—i.e., matching the adjacency matrices (‘April’) and (‘May’) using FAQ initialized at —yields Inline graphic vertices correctly aligned across networks. Although the true correspondence is not optimal (according to the GM objective function), the estimated local optimal correspondence does match of the vertices correctly across networks without the need for centering. It is worth noting that preprocessing the data via USVT centering before matching again yields Inline graphic vertices correctly matched by FAQ initialized at (with a suitable USVT threshold here being ). This suggests that the centering procedure does not hurt performance when the graph topologies are similar across networks, and as we will demonstrate below, can significantly increase performance when the graph topologies differ across networks. We do note here that we do not zero out the diagonal of Inline graphic in the USVT step in this real data example, as here, hollow ’s led to significantly worse performance than the non-hollow ’s.

While both the centered and uncentered graphs are highly Inline graphic - and -matchable, respectively, centering does have a very interesting algorithmic effect here; see Fig. 5. In the figure, we plot the number of vertices in the April–May Twitter graphs correctly matched by FAQ vs. the graph matching objective function value. In each panel (on the left matching Inline graphic and , and on the right and ), we initialize FAQ at 100 different starting points: once at (labelled ‘P0 = I’ in the legend) and the rest at random permutation restarts (labelled ‘rand. start’ in the legend). The figure suggests that centering has the effect of creating a more stable objective function gap between the estimated optimal permutation and suboptimal alternatives. In a setting where multiple random restarts are possible—and needed—to recover an unknown latent alignment, this suggests that the optimal alignment is perhaps more easily recognized in the centered graph regime, and hence online stopping criterion more easily implementable.

Fig. 5. — We plot the number of vertices in the April–May Twitter graphs correctly matched by `FAQ` vs. the graph matching objective function value. In each panel (on the left matching and , and on the right and ), we initialize `FAQ` at 100 different starting points: once at (labelled ‘P0 = I’ in the legend) and the rest at random restarts (labelled ‘rand. start’ in the legend).

To explore the performance of the two approaches (centering and not centering) in the setting of different network topologies, we consider the following synthetic data experiment. We choose Inline graphic random users from the twitter networks and add to their induced subgraphs in the May network, where ER (followed by again binarizing ); i.e., if the set of randomly chosen vertices is , then and the subsequent is binarized before matching or centering. We consider . This experiment is simulating the setting where a fraction of the network changes its behaviour from April to May; in this case by increasing their volume of mentions month to month. For each value of Inline graphic , we repeat this experiment times and plot the mean accuracy (1 s.d.) of graph matching using FAQ initialized at both with and without USVT centering; see Fig. 6. This experiment demonstrates the capacity for USVT to maintain -matchability in the face of additive deviations in the network structure. These deviations have the effect of altering the graph topology month-to-month, and with enough signal, they have a precipitously negative impact on the performance of matching sans centering. Centering ameliorates this effect, and emphasizes the common signal in the networks by removing the effect of this additive noise. It is interesting to note that for small values of Inline graphic , centering negatively impacts algorithmic performance. We view this as potentially an artifact of the noise in these settings not being sufficient to obfuscate the true matching without centering.

Fig. 6. — In the left panel, we plot the average matching accuracy ( s.d.) of graph matching using `FAQ` initialized at when first choosing random vertices, denoted from graph and then substituting (where is again binarized after noise is added) before matching and centering; here . Accuracy is plotted vs. In the right two panels, we plot the degrees of each vertex in April vs. the degrees of the same vertex in May (with the substitution).

In the core-junk setting, the heterogeneity among the junk vertices offers a further setting for demonstrating the utility of USVT-centered graph matching. To see this, we consider the following experiment: choose Inline graphic uniformly sampled core vertices from the twitter network and uniformly sampled junk vertices, , for the April graph and uniformly sampled junk vertices, , for the May graph. As before, we match and using FAQ initialized at both with and without USVT centering; results are summarized in Fig. 7. As seen previously, the ability of USVT-centering to ameliorate the degree/distributional heterogeneity (here among the junk vertices) leads to superior core label recovery compared to the uncentered matching setting.

Fig. 7. — We plot the average core matching accuracy ( s.d.) for and using `FAQ` initialized at (with ) against . Results are averaged over Monte Carlo iterates.

3.3 Connectomes

For our next example, we consider the diffusion MRI data from [31]. The dataset consists of test–retest pairs (used to evaluate reproducibility of the magnetization prepared rapid acquisition gradient echo image protocol). Each scan is converted into a weighted connectome by considering 70 brain regions of interest (labelled according to the Desikan brain atlas [16]) as the vertices, with edge weights counting the number of neural fiber bundles connecting the regions. See [46,47] for more detail on how these graphs were constructed. As vertices correspond to canonical brain regions of interest, it is natural to consider the true correspondence across graphs as being given by the identity mapping.

To illustrate the role of USVT centering in this data set, we first consider as an example a pair of graphs generated as above from the data in [31]. The respective adjacency matrices for this graph pair are shown in Fig. 8. Matching these brains directly using FAQ initialized at Inline graphic yields an estimated local optimum with 65 vertices correctly aligned across graphs; indeed, by permuting vertices , we obtain a better objective function value than the GMP evaluate at . We seek to understand the ability of USVT centering, which is global in nature, to correct these local mismatches.

Fig. 8. — Adjacency matrices of two sample brains from the dataset in [31].

To study this further, we apply a variant of the USVT procedure in which we automatically select the number of singular values to threshold by combining the ideas of USVT with the profile likelihood work of [63]; to wit, we select the threshold dimension via an elbow analysis of the SCREE plot of the singular values. We chose this automated procedure rather than setting a singular value threshold because these graphs are weighted, and the common threshold of Inline graphic from [9,55] is presented for the unweighted setting. Centering the pair of graphs from Fig. 8 recovers the identity as an estimated local minima of the GMP, and the global centering corrects the localized mismatch.

Extending this to a 41-scan sample from [31] (hence, we consider 41 graphs each with 70 vertices), we run FAQ initialized at Inline graphic for the pairs of distinct graphs with both USVT centering and no centering. When matching graph and graph for , we let

In Fig. 9, we plot a heatmap of the Inline graphic differences , so that the th entry in the heatmap corresponds to the excess number of correct matches achieved by USVT centering. Red values indicate more correctly matched via centering and blue values indicate more correctly matched via no centering. The colour intensity indicates the value of Inline graphic achieved, with darker colours indicating more (in the red case) or less (in the blue case) vertices correctly matched after centering. The figure demonstrates that the phenomena observed in the graphs in Fig. 8 was not an anomaly. Only two pairs see an improvement in matching accuracy when not centering, while Inline graphic pairs see an improvement in matching accuracy when USVT centering. Moreover, while many of the mismatches are local in nature, they are nonetheless ameliorated by the global USVT centering procedure.

If we consider running the same experiment on the unweighted brain graphs (using USVT centering with threshold Inline graphic ), we see the delicate nature of the USVT threshold in data applications (see Figure 10). We note here that, again, in this unweighted case, we do not zero out the diagonal of in the USVT step in this real data example. When , pairs achieved improved matching performance when USVT centering first, and Inline graphic pairs achieved improved matching performance when not centering. When , pairs achieved improved matching performance when USVT centering first, and pairs achieved improved matching performance when not centering. These results suggest two important take-aways: First, performance is intimately tied to properly thresholding; and second, in this example, USVT-centering is more effective in the weighted edge case. This suggests both that the magnitude of weights are contributing significantly to the mismatch and that USVT centering is effective at ameliorating this edge weight heterogeneity; this is not entirely unexpected as the centering is precisely trying to eliminate the different edge probability/weight structures across the Inline graphic .

Fig. 10. — For the *unweighted* brain pairs, we plot a heatmap of the differences , so that the th entry in the heatmap corresponds to the excess number of correct matches achieved by USVT centering. Red values indicate more correctly matched via centering and blue values indicate more correctly matched via no centering. The colour intensity indicates the value of achieved, with darker colours indicating more (in the red case) or less (in the blue case) vertices correctly matched after centering. In the left panel, we center by USVT with ; in the right panel by .

4. Discussion

Understanding the limits of Inline graphic -matchability is an essential step in robust multiple graph inference regimes. When graphs are not -matchable—i.e., the true node correspondence cannot be recovered in the face of noise—paired graph inference methodologies that utilize the across graph correspondence (see, for example, [2,50]) cannot gainfully be employed. Non- Inline graphic -matchability can limit analysis to methods which rely on graph statistics which are invariant to relabelling of the vertices, which can be useful, but lack the full power of their parametric (with the labelling as a parameter) counterparts. In this paper, we establish initial theoretical results on Inline graphic -matchability when the graphs to be matched differ in distribution, and when only a fraction of the graphs are matchable.

While our theoretical results and subsequent simulations and experiments provide a basis for a deeper understanding of the effect that distributional heterogeneity has on Inline graphic -matchability, there is still much to be done. For example, in our present USVT centering step, the across graph correlation provided by is not utilized. We suspect that the error in the USVT steps could be greatly reduced by leveraging or an estimate thereof. We are also exploring graph normalization strategies other than centering, such as kernel smoothing (using a small number of a priori known correspondences to choose the proper smoothing kernels), which may be more appropriate in the presence of multiplicative or other nonlinear noise structures. We suspect that the growth rate of Inline graphic in Theorem 2.5 is not sharp, as in simulation and real data settings the true correspondence is almost perfectly recovered via USVT centering before matching; however, sharpening the lower bound on with our present methods does not seem feasible and new ideas and techniques need be employed.

We suspect that the bounds on Inline graphic vs. obtained in Theorem 2.9 are also suboptimal. As an informal argument we can consider known results for the quadratic assignment problem for i.i.d. entries [8]. In the uncorrelated dense homogeneous Erd̋s–Rényi setting, these results imply that the best solution to the GMP, Eq. (1.1) will reduce the objective function Inline graphic as compared to a random guess. On the other hand, in the dense homogeneous Erd̋s–Rényi setting with constant correlation, the best solution is better than a random guess. Under the heuristic that in order for the core vertices to be matched correctly, the signal from correlation must be greater than the possible improvements in the all noise setting, we can conjecture that a core matchability threshold at approximately Inline graphic may be possible. Using our present proof technique, we are unable to achieve this rate. This heuristic argument, though problematic, provides a potential guidepost for future work.

In addition, while the theoretical results presented herein are for the case that the graphs are simple undirected graph with no edge-weights, the graph matching framework of Definition 1.1 is flexible, allowing us to accommodate many of the features—both those considered above, and additional eccentricities—inherent to real data settings. In the weighted, loopy setting, the matching can occur between the weighted adjacency matrices or normalized Laplacian matrices, Inline graphic (where is the diagonal matrix with th entry equal to ) and the similarly defined . To match directed graphs, the graphs can either be made undirected (for example, by matching to ) or the directed adjacency matrices can be directly plugged into Eq. (1.1). Developing similar results to Theorems 2.5 and 2.9 in models (akin to our CorrER model) that incorporate these graph features, as well as additional vertex and edge features, is a natural next step.

The information and computational limits for Inline graphic -matchability are still open problems for which we have pushed the boundaries, but significant more work is to be done. These problems are in analogy to the recently addressed problems of detection and recovery for the planted partition and planted clique problems for a single graph [1,22,42]. For these settings, exact fundamental limits have been established and polynomial time algorithms have been shown to achieve or nearly achieve these limits. Obtaining similar results for the GMP are key steps towards a robust statistical framework for multiple graph inference.

Acknowledgements

This material is based on research sponsored by the Air Force Research Laboratory and Defense Advanced Research Products Agency (DARPA) under agreement number FA8750-18-2-0035. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA or the US Government. We would also like to thank Prof. Carey E. Priebe, Prof. Minh Tang and Joshua Cape for their helpful discussions in the writing of this manuscript.

Funding

MIT Lincoln Labs and the Department of Defense (for Dr. Sussman); NIH (for Dr. Lyzinski, via grant BRAIN U01-NS108637); Air Force Research Laboratory and DARPA under agreement number FA8750-18-2-0035.

Appendix. A. Proofs

Herein, we collect proofs of the main theoretical results in this manuscript. Before stating our proofs, we first state some well-known facts about the bivariate Bernoulli distribution. Indeed, if Inline graphic model, then for each , and can be realized as a bivariate Bernoulli random variable. This will be a key insight in the proof of our main results, Theorems 2.3 and 2.5.

Spelling this out further, a pair of Bernoulli random variables Inline graphic has a BiBernoulli distribution with

(A.1)

if Inline graphic for each . A key property of BiBernoulli random variables is that they can be generated by a triple of independent Bernoulli random variables. For as above, setting ), , and , with and independent yields

In the Inline graphic model, we then have that

where

(A.2)

are independent Bernoulli random variables.

A.1 Proof of Theorem 2.3

The key to the Proof of Theorem 2.3 is the well-known McDiarmid’s inequality [41].

Proposition A.1 (McDiarmid’s inequality). —

Let be a sequence of independent random variables. Let be such that for all , and for all

If , then for any ,

□

Proof of Theorem 2.3. Let Inline graphic , and consider and .

If Inline graphic and are -correlated Bernoulli random variables with respective parameters and , then it follows that

Let Inline graphic be a permutation matrix that permutes exactly labels and let be the associated permutation for . Note that

satisfies

where

is the number of transpositions induced by Inline graphic . Note that , and so

(A.3)

For each Inline graphic , define the matrix as in Eq. 2.3. We have then

(A.4)

For ease of notation, we define

so that

As Inline graphic , and each pair is a function of three independent Bernoulli random variables (the from Eq. (A.2) that are independent by construction), we have that is a function of

and so is a function of Inline graphic independent Bernoulli random variables, where then satisfies

Changing the value of one of these Bernoulli random variables leaving all others fixed can change the value of at most one Inline graphic pair, and this pair appears in two terms in the sum . As each term in the sum of is bounded in , we have that, in the notation of Proposition A.1, each can be uniformly set to . Proposition A.1 then yields (setting )

(A.5)

with Inline graphic being an appropriate positive constant that may change line to line. If then

(A.6)

To finish the proof, we apply a union bound on all such Inline graphic . The number of such permutations that permute vertex labels is upper bounded by . Combining with Eq. (A.6), we have that under the assumptions of the theorem

as desired.□

A.2 Proof of Theorem 2.5

Key to the proof of Theorem 2.5 are the following lemmas, adapted here from [55,Lemmas 1 and 2].

Lemma A.1

Let . Suppose for fixed. Let be the singular value decomposition of , and let

Then

where are the singular values of .

Lemma A.2

Let ER(); i.e., is a hollow, symmetric matrix with . Assume for some . If for a constant , then for all there exists a constant such that

Note first that

with the analogous result holding for Inline graphic . Under the assumptions of Theorem 2.5, Lemma A.2 implies that there exists constants such that

with probability at least Inline graphic , and therefore there exists constants such that

with probability at least Inline graphic .

Next, we apply Lemma A.1 with Inline graphic , (resp., , ). With probability at least , there exists constants such that if then (where is as defined in the USVT pseudocode, Algorithm 1)

where the equality follows from the rank assumption on Inline graphic in Theorem 2.5; similarly, for we have

Combining the above, we have that there exists an event Inline graphic such that , and on ,

To prove Theorem 2.5, we proceed as follows. Fix Inline graphic so that ; i.e., permutes exactly labels. Two simple applications of the triangle inequality yields that

and

Combining the above, we have that

In the proof of Theorem 2.3, if we set Inline graphic in Eq. A.5 when applying McDiarmid’s inequality, then under the assumptions of Theorem 2.5, there exists an event with such that on ,

Next, note that

Hoeffding’s inequality (see, for example, [11]) yields that

(A.7)

(A.8)

with probability at least

(A.9)

(where the last inequality followed from the assumptions in the Theorem, as under the assumptions Inline graphic ). Therefore there exists an event such that Eq. (A.7) and (A.8) hold on , and

Writing

we see that

We see then that on Inline graphic ,

(A.10)

Let Inline graphic be such that

If Inline graphic for , on , where

Define the event

and note that

Combined, this yields

as desired.□

A.3 Proof of Theorem 2.7

For a given permutation Inline graphic on , we define the permutation uniquely as follows:

(A.11)

For example, if Inline graphic , , and

then

For a permutation matrix Inline graphic , we define analogously, where we recall here that is the set of permutation matrices in satisfying (i.e., fixing all core labels). Define . Define the events

Inline graphic is the event that the optimal GMP permutation is not in and the graphs are not core -matchable for . As , we have that .

Suppose that Inline graphic (with corresponding permutation ) permutes core labels, where

(A.12)

(A.13)

Applying the results in Appendix A on the Bivariate Bernoulli distribution, we see that Inline graphic is a function of independent Bernoulli random variables, where

As in the proof of Theorem 2.3, we next apply Proposition A.1 to bound the probability that Inline graphic provides a better matching than . By the assumption that if either or is a junk vertex, it holds that

To ease notation, we define Inline graphic , so that

To use a union bound, note that the number of permutations Inline graphic with error counts in Eqs. (A.12–A.13) given by and is bounded above by . Let be the number of core vertices permuted by . Hence, if and

then we have that

as desired.□

Footnotes

These graphs were provided as part of the DARPA XDATA project.

Contributor Information

Vince Lyzinski, Email: vlyzinsk@umd.edu.

Daniel L Sussman, Email: sussman@bu.edu.

References

1. Abbe E. & Sandon C. (2015) Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE Computer Society, pp. 670–688. [Google Scholar]
2. Asta D. & Shalizi C. (2015) Geometric network comparisons. Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15. Arlington, Virginia, USA: AUAI Press, pp. 102–110. [Google Scholar]
3. Babai L. (2016) Graph isomorphism in quasipolynomial time. Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. New York, NY, USA: ACM, pp. 684–697. [Google Scholar]
4. Barak B., Chou C., Lei Z., Schramm T. & Sheng Y. (2019) (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. Advances in Neural Information Processing Systems. 9186–9194
5. Bayati M., Gerritsen M., Gleich D. F., Saberi A. & Wang Y. (2009) Algorithms for large, sparse network alignment problems. 2009 Ninth IEEE International Conference on Data Mining. Miami, FL: IEEE, pp. 705–710. [Google Scholar]
6. Bollobás B. (2001) Random Graphs. Cambridge University Press. [Google Scholar]
7. Bougleux S., Brun L., Carletti V., Foggia P., Gaüzère B. & Vento M. (2017) Graph edit distance as a quadratic assignment problem. Pattern Recogn. Lett., 87, 38–46. [Google Scholar]
8. Cela E. (2011) The Quadratic Assignment Problem: Theory and Algorithms; Combinatorial Optimization, vol. 1. New York, NY: Springer. [Google Scholar]
9. Chatterjee S. (2015) Matrix estimation by universal singular value thresholding. Ann. Stat., 43, 177–214. [Google Scholar]
10. Chen L., Vogelstein J. T., Lyzinski V. & Priebe C. E. (2016) A joint graph inference case study: the c. elegans chemical and electrical connectomes. Worm, vol. 5. [DOI] [PMC free article] [PubMed]
11. Chung F. & Lu L (2006) Concentration inequalities and martingale inequalities: a survey. Int. Math., 3, 79–127. [Google Scholar]
12. Conte D., Foggia P., Sansone C. & Vento M. (2004) Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artificial Intell., 18, 265–298. [Google Scholar]
13. Cullina D. & Kiyavash N. (2016) Improved achievability and converse bounds for Erd̋s–Rényi graph matching. ACM SIGMETRICS Performance Evaluation Review, vol. 44 New York, NY, USA: ACM, pp. 63–72. [Google Scholar]
14. Cullina D. & Kiyavash N. (2017) Exact alignment recovery for correlated Erd̋s–Rényi graphs. arXiv preprint arXiv:1711.06783.
15. Cullina D., Kiyavash N., Mittal P. & Poor H. V. (2018) Partial recovery of Erd̋s–Rényi graph alignment via -core alignment. arXiv preprint arXiv:1809.03553.
16. Desikan R. S., Ségonne F., Fischl B., Quinn B. T., Dickerson B. C., Blacker D., Buckner R. L., Dale A. M., Maguire R. P. & Hyman B. T (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage, 31, 968–980. [DOI] [PubMed] [Google Scholar]
17. Ding J., Ma Z., Wu Y. & Xu J. (2018) Efficient random graph matching via degree profiles. arXiv preprint arXiv:1811.07821.
18. Elmsallati A., Clark C. & Kalita J (2016) Global alignment of protein-protein interaction networks: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform., 13, 689–705. [DOI] [PubMed] [Google Scholar]
19. Emmert-Streib F., Dehmer M. & Shi Y. (2016) Fifty years of graph matching, network alignment and network comparison. Inform. Sci., 346–347, 180–197. [Google Scholar]
20. Escolano F., Hancock E. R. & Lozano M. (2011) Graph matching through entropic manifold alignment. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 2417–2424. [Google Scholar]
21. Fang F., Sussman D. L. & Lyzinski V. (2018) Tractable graph matching via soft seeding. arXiv preprint arXiv:1807.09299.
22. Feldman V., Grigorescu E., Reyzin L., Vempala S. & Xiao Y. (2013) Statistical algorithms and a lower bound for detecting planted cliques. Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC’13. New York, NY, USA: ACM, pp. 655–664. [Google Scholar]
23. Foggia P., Percannella G. & Vento M. (2014) Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recogn. Artificial Intell., 28, 1450001. [Google Scholar]
24. Frank M. & Wolfe P. (1956) An algorithm for quadratic programming. Nav. Res. Logist. Q., 3, 95–110. [Google Scholar]
25. Heimann M., Shen H., Safavi T. & Regal D. K. (2018) Representation learning-based graph alignment. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, pp. 117–126. [Google Scholar]
26. Hoff P. D., Raftery A. E. & Handcock M. S. (2002) Latent space approaches to social network analysis. J. Amer. Statist. Assoc., 97,1090–1098. [Google Scholar]
27. Holland P. W., Laskey K. B. & Leinhardt S. (1983) Stochastic blockmodels: First steps. Soc. Netw., 5,109–137. [Google Scholar]
28. Horn R. A. & Johnson C. R. (2012) Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press. [Google Scholar]
29. Kazemi E., Yartseva L. & Grossglauser M. (2015) When can two unlabeled networks be aligned under partial overlap? 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). Washington, DC, USA: IEEE Computer Society, pp. 33–42. [Google Scholar]
30. Klau G. W. (2009) A new graph-based method for pairwise global network alignment. BMC Bioinform., 10, S59. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Landman B. A., Huang A. J., Gifford A., Vikram D. S., Lim I. A. L., Farrell J. A. D., Bogovic J. A., Hua J., Chen M. & Jarso S. (2011) Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage, 54, 2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Le C. M., Levina E. & Vershynin R. (2017) Concentration and regularization of random graphs. Random Struct. Algor., 51,538–561. [Google Scholar]
33. Lee J., Cho M. & Lee K. M. (2010) A graph matching algorithm using data-driven markov chain monte carlo sampling. 2010 20th International Conference on Pattern Recognition (ICPR). Washington, DC, USA: IEEE Computer Society, pp. 2816–2819. [Google Scholar]
34. Li L. & Campbell W. M. (2015) Matching community structure across online social networks. NIPS Workshop on Networks in the Social and Information Sciences. Montreal, Quebec, Canada. [Google Scholar]
35. Lin L., Liu X. & Zhu S.-C. (2010) Layered graph matching with composite cluster sampling. IEEE Trans. Pattern Anal. Mach. Intell., 32, 1426–1442. [DOI] [PubMed] [Google Scholar]
36. Loiola E. M., de Abreu N. M. M., Boaventura-Netto P. O., Hahn P. & Querido T. (2007) A survey for the quadratic assignment problem. Eur. J. Oper. Res., 176,657–690. [Google Scholar]
37. Lyzinski V. (2018) Information recovery in shuffled graphs via graph matching. IEEE Trans. Inform. Theory, 64, 3254–3273. [Google Scholar]
38. Lyzinski V., Fishkind D. E., Fiori M., Vogelstein J. T., Priebe C. E. & Sapiro G. (2016) Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell., 38,60–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Lyzinski V., Fishkind D. E. & Priebe C. E. (2014) Seeded graph matching for correlated Erdos–Renyi graphs. J. Mach. Learn. Res., 15, 3513–3540. [Google Scholar]
40. Lyzinski V., Levin K. & Priebe C. E. (2019) On consistent vertex nomination schemes. J. Mach. Learn. Res, 20, 1–39. [Google Scholar]
41. McDiarmid C. (1989) On the method of bounded differences. Surveys Combin., 141, 148–188. [Google Scholar]
42. Mossel E., Neeman J. & Sly A. (2014) Belief propagation, robust reconstruction and optimal recovery of block models. Proceedings of The 27th Conference on Learning Theory, pp. 356–370.
43. Onaran E., Garg S. & Erkip E. (2016) Optimal de-anonymization in random graphs with community structure. 2016 50th Asilomar Conference on Signals, Systems and Computers. Washington, DC, USA: IEEE Computer Society, pp. 709–713. [Google Scholar]
44. Pedarsani P. & Grossglauser M. (2011) On the privacy of anonymized networks. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1235–1243. [Google Scholar]
45. Robles-Kelly A. & Hancock E. R. (2007) A riemannian approach to graph embedding. Pattern Recogn., 40,1042–1056. [Google Scholar]
46. Gray W. R., Bogovic J. A., Vogelstein J. T., Landman B. A., Prince J. L. & Vogelstein R. J. (2012) Magnetic resonance connectome automated pipeline: an overview. IEEE Pulse, 3, 42–48. [DOI] [PubMed] [Google Scholar]
47. Roncal W. R., Koterba Z. H., Mhembere D., Kleissas D. M., Vogelstein J. T., Burns R., Bowles A. R., Donavos D. K., Ryman S. & Jung R. E. (2013) MIGRAINE: MRI graph reliability analysis and inference for connectomics. IEEE Global Conf. Signal Inform. Process., 313–316. [Google Scholar]
48. Sang J. & Xu C. (2012) Robust face-name graph matching for movie character identification. IEEE Trans. Multimedia, 14, 586–596. [Google Scholar]
49. Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2019) Matched filters for noisy induced subgraph detection. IEEE Trans. Pattern Anal. Mach. Intell., 1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Tang M., Athreya A., Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2017) A semiparametric two-sample hypothesis testing problem for random dot product graphs. J. Comput. Graph. Statist., 26, 344–354. [Google Scholar]
51. Trosset M. W., Priebe C. E., Park Y. & Miller M. I. (2008) Semisupervised learning from dissimilarity data. Comput. Statist. Data Anal., 52, 4643–4657. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Udell M. & Townsend A. (2017) Nice latent variable models have log-rank. arXiv preprint arXiv:1705.07474.
53. Umeyama S. (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell., 10, 695–703. [Google Scholar]
54. Vogelstein J. T., Conroy J. M., Lyzinski V., Podrazik L. J., Kratzer S. G., Harley E. T., Fishkind D. E., Vogelstein R. J. & Priebe C. E. (2014) Fast approximate quadratic programming for graph matching. PLoS ONE, 10(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Xu J. (2017) Rates of convergence of spectral methods for graphon estimation. arXiv preprint arXiv:1709.03183.
56. Yartseva L. & Grossglauser M. (2013) On the performance of percolation graph matching. Proceedings of the First ACM Conference on Online Social Networks. New York, NY, USA: ACM, pp. 119–130. [Google Scholar]
57. Young S. & Scheinerman E. (2007) Random dot product graph models for social networks. Proceedings of the 5th International Conference on Algorithms and Models for the Web-Graph. Berlin, Heidelberg: Springer, pp. 138–149. [Google Scholar]
58. Zaslavskiy M., Bach F. & Vert J. P. (2009) A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell., 31, 2227–2242. [DOI] [PubMed] [Google Scholar]
59. Zhang S. & Final H. T. (2016) Fast attributed network alignment. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1345–1354. [Google Scholar]
60. Zhang Y. (2018) Consistent polynomial-time unseeded graph matching for Lipschitz graphons. arXiv preprint arXiv:1807.11027.
61. Zhang Y. (2018) Unseeded low-rank graph matching by transform-based unsupervised point registration. arXiv preprint arXiv:1807.04680.
62. Zhou F. & De la Torre F. (2012) Factorized graph matching. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 127–134. [Google Scholar]
63. Zhu M. & Ghodsi A. (2006) Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Statist. Data Anal., 51, 918–930. [Google Scholar]

[ref1] 1. Abbe E. & Sandon C. (2015) Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE Computer Society, pp. 670–688. [Google Scholar]

[ref2] 2. Asta D. & Shalizi C. (2015) Geometric network comparisons. Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15. Arlington, Virginia, USA: AUAI Press, pp. 102–110. [Google Scholar]

[ref3] 3. Babai L. (2016) Graph isomorphism in quasipolynomial time. Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. New York, NY, USA: ACM, pp. 684–697. [Google Scholar]

[ref4] 4. Barak B., Chou C., Lei Z., Schramm T. & Sheng Y. (2019) (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. Advances in Neural Information Processing Systems. 9186–9194

[ref5] 5. Bayati M., Gerritsen M., Gleich D. F., Saberi A. & Wang Y. (2009) Algorithms for large, sparse network alignment problems. 2009 Ninth IEEE International Conference on Data Mining. Miami, FL: IEEE, pp. 705–710. [Google Scholar]

[ref6] 6. Bollobás B. (2001) Random Graphs. Cambridge University Press. [Google Scholar]

[ref7] 7. Bougleux S., Brun L., Carletti V., Foggia P., Gaüzère B. & Vento M. (2017) Graph edit distance as a quadratic assignment problem. Pattern Recogn. Lett., 87, 38–46. [Google Scholar]

[ref8] 8. Cela E. (2011) The Quadratic Assignment Problem: Theory and Algorithms; Combinatorial Optimization, vol. 1. New York, NY: Springer. [Google Scholar]

[ref9] 9. Chatterjee S. (2015) Matrix estimation by universal singular value thresholding. Ann. Stat., 43, 177–214. [Google Scholar]

[ref10] 10. Chen L., Vogelstein J. T., Lyzinski V. & Priebe C. E. (2016) A joint graph inference case study: the c. elegans chemical and electrical connectomes. Worm, vol. 5. [DOI] [PMC free article] [PubMed]

[ref11] 11. Chung F. & Lu L (2006) Concentration inequalities and martingale inequalities: a survey. Int. Math., 3, 79–127. [Google Scholar]

[ref12] 12. Conte D., Foggia P., Sansone C. & Vento M. (2004) Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artificial Intell., 18, 265–298. [Google Scholar]

[ref13] 13. Cullina D. & Kiyavash N. (2016) Improved achievability and converse bounds for Erd̋s–Rényi graph matching. ACM SIGMETRICS Performance Evaluation Review, vol. 44 New York, NY, USA: ACM, pp. 63–72. [Google Scholar]

[ref14] 14. Cullina D. & Kiyavash N. (2017) Exact alignment recovery for correlated Erd̋s–Rényi graphs. arXiv preprint arXiv:1711.06783.

[ref15] 15. Cullina D., Kiyavash N., Mittal P. & Poor H. V. (2018) Partial recovery of Erd̋s–Rényi graph alignment via -core alignment. arXiv preprint arXiv:1809.03553.

[ref16] 16. Desikan R. S., Ségonne F., Fischl B., Quinn B. T., Dickerson B. C., Blacker D., Buckner R. L., Dale A. M., Maguire R. P. & Hyman B. T (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage, 31, 968–980. [DOI] [PubMed] [Google Scholar]

[ref17] 17. Ding J., Ma Z., Wu Y. & Xu J. (2018) Efficient random graph matching via degree profiles. arXiv preprint arXiv:1811.07821.

[ref18] 18. Elmsallati A., Clark C. & Kalita J (2016) Global alignment of protein-protein interaction networks: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform., 13, 689–705. [DOI] [PubMed] [Google Scholar]

[ref19] 19. Emmert-Streib F., Dehmer M. & Shi Y. (2016) Fifty years of graph matching, network alignment and network comparison. Inform. Sci., 346–347, 180–197. [Google Scholar]

[ref20] 20. Escolano F., Hancock E. R. & Lozano M. (2011) Graph matching through entropic manifold alignment. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 2417–2424. [Google Scholar]

[ref21] 21. Fang F., Sussman D. L. & Lyzinski V. (2018) Tractable graph matching via soft seeding. arXiv preprint arXiv:1807.09299.

[ref22] 22. Feldman V., Grigorescu E., Reyzin L., Vempala S. & Xiao Y. (2013) Statistical algorithms and a lower bound for detecting planted cliques. Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC’13. New York, NY, USA: ACM, pp. 655–664. [Google Scholar]

[ref23] 23. Foggia P., Percannella G. & Vento M. (2014) Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recogn. Artificial Intell., 28, 1450001. [Google Scholar]

[ref24] 24. Frank M. & Wolfe P. (1956) An algorithm for quadratic programming. Nav. Res. Logist. Q., 3, 95–110. [Google Scholar]

[ref25] 25. Heimann M., Shen H., Safavi T. & Regal D. K. (2018) Representation learning-based graph alignment. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, pp. 117–126. [Google Scholar]

[ref26] 26. Hoff P. D., Raftery A. E. & Handcock M. S. (2002) Latent space approaches to social network analysis. J. Amer. Statist. Assoc., 97,1090–1098. [Google Scholar]

[ref27] 27. Holland P. W., Laskey K. B. & Leinhardt S. (1983) Stochastic blockmodels: First steps. Soc. Netw., 5,109–137. [Google Scholar]

[ref28] 28. Horn R. A. & Johnson C. R. (2012) Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press. [Google Scholar]

[ref29] 29. Kazemi E., Yartseva L. & Grossglauser M. (2015) When can two unlabeled networks be aligned under partial overlap? 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). Washington, DC, USA: IEEE Computer Society, pp. 33–42. [Google Scholar]

[ref30] 30. Klau G. W. (2009) A new graph-based method for pairwise global network alignment. BMC Bioinform., 10, S59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31. Landman B. A., Huang A. J., Gifford A., Vikram D. S., Lim I. A. L., Farrell J. A. D., Bogovic J. A., Hua J., Chen M. & Jarso S. (2011) Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage, 54, 2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] 32. Le C. M., Levina E. & Vershynin R. (2017) Concentration and regularization of random graphs. Random Struct. Algor., 51,538–561. [Google Scholar]

[ref33] 33. Lee J., Cho M. & Lee K. M. (2010) A graph matching algorithm using data-driven markov chain monte carlo sampling. 2010 20th International Conference on Pattern Recognition (ICPR). Washington, DC, USA: IEEE Computer Society, pp. 2816–2819. [Google Scholar]

[ref34] 34. Li L. & Campbell W. M. (2015) Matching community structure across online social networks. NIPS Workshop on Networks in the Social and Information Sciences. Montreal, Quebec, Canada. [Google Scholar]

[ref35] 35. Lin L., Liu X. & Zhu S.-C. (2010) Layered graph matching with composite cluster sampling. IEEE Trans. Pattern Anal. Mach. Intell., 32, 1426–1442. [DOI] [PubMed] [Google Scholar]

[ref36] 36. Loiola E. M., de Abreu N. M. M., Boaventura-Netto P. O., Hahn P. & Querido T. (2007) A survey for the quadratic assignment problem. Eur. J. Oper. Res., 176,657–690. [Google Scholar]

[ref37] 37. Lyzinski V. (2018) Information recovery in shuffled graphs via graph matching. IEEE Trans. Inform. Theory, 64, 3254–3273. [Google Scholar]

[ref38] 38. Lyzinski V., Fishkind D. E., Fiori M., Vogelstein J. T., Priebe C. E. & Sapiro G. (2016) Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell., 38,60–73. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39. Lyzinski V., Fishkind D. E. & Priebe C. E. (2014) Seeded graph matching for correlated Erdos–Renyi graphs. J. Mach. Learn. Res., 15, 3513–3540. [Google Scholar]

[ref40] 40. Lyzinski V., Levin K. & Priebe C. E. (2019) On consistent vertex nomination schemes. J. Mach. Learn. Res, 20, 1–39. [Google Scholar]

[ref41] 41. McDiarmid C. (1989) On the method of bounded differences. Surveys Combin., 141, 148–188. [Google Scholar]

[ref42] 42. Mossel E., Neeman J. & Sly A. (2014) Belief propagation, robust reconstruction and optimal recovery of block models. Proceedings of The 27th Conference on Learning Theory, pp. 356–370.

[ref43] 43. Onaran E., Garg S. & Erkip E. (2016) Optimal de-anonymization in random graphs with community structure. 2016 50th Asilomar Conference on Signals, Systems and Computers. Washington, DC, USA: IEEE Computer Society, pp. 709–713. [Google Scholar]

[ref44] 44. Pedarsani P. & Grossglauser M. (2011) On the privacy of anonymized networks. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1235–1243. [Google Scholar]

[ref45] 45. Robles-Kelly A. & Hancock E. R. (2007) A riemannian approach to graph embedding. Pattern Recogn., 40,1042–1056. [Google Scholar]

[ref46] 46. Gray W. R., Bogovic J. A., Vogelstein J. T., Landman B. A., Prince J. L. & Vogelstein R. J. (2012) Magnetic resonance connectome automated pipeline: an overview. IEEE Pulse, 3, 42–48. [DOI] [PubMed] [Google Scholar]

[ref47] 47. Roncal W. R., Koterba Z. H., Mhembere D., Kleissas D. M., Vogelstein J. T., Burns R., Bowles A. R., Donavos D. K., Ryman S. & Jung R. E. (2013) MIGRAINE: MRI graph reliability analysis and inference for connectomics. IEEE Global Conf. Signal Inform. Process., 313–316. [Google Scholar]

[ref48] 48. Sang J. & Xu C. (2012) Robust face-name graph matching for movie character identification. IEEE Trans. Multimedia, 14, 586–596. [Google Scholar]

[ref49] 49. Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2019) Matched filters for noisy induced subgraph detection. IEEE Trans. Pattern Anal. Mach. Intell., 1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref50] 50. Tang M., Athreya A., Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2017) A semiparametric two-sample hypothesis testing problem for random dot product graphs. J. Comput. Graph. Statist., 26, 344–354. [Google Scholar]

[ref51] 51. Trosset M. W., Priebe C. E., Park Y. & Miller M. I. (2008) Semisupervised learning from dissimilarity data. Comput. Statist. Data Anal., 52, 4643–4657. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref52] 52. Udell M. & Townsend A. (2017) Nice latent variable models have log-rank. arXiv preprint arXiv:1705.07474.

[ref53] 53. Umeyama S. (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell., 10, 695–703. [Google Scholar]

[ref54] 54. Vogelstein J. T., Conroy J. M., Lyzinski V., Podrazik L. J., Kratzer S. G., Harley E. T., Fishkind D. E., Vogelstein R. J. & Priebe C. E. (2014) Fast approximate quadratic programming for graph matching. PLoS ONE, 10(4). [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref55] 55. Xu J. (2017) Rates of convergence of spectral methods for graphon estimation. arXiv preprint arXiv:1709.03183.

[ref56] 56. Yartseva L. & Grossglauser M. (2013) On the performance of percolation graph matching. Proceedings of the First ACM Conference on Online Social Networks. New York, NY, USA: ACM, pp. 119–130. [Google Scholar]

[ref57] 57. Young S. & Scheinerman E. (2007) Random dot product graph models for social networks. Proceedings of the 5th International Conference on Algorithms and Models for the Web-Graph. Berlin, Heidelberg: Springer, pp. 138–149. [Google Scholar]

[ref58] 58. Zaslavskiy M., Bach F. & Vert J. P. (2009) A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell., 31, 2227–2242. [DOI] [PubMed] [Google Scholar]

[ref59] 59. Zhang S. & Final H. T. (2016) Fast attributed network alignment. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1345–1354. [Google Scholar]

[ref60] 60. Zhang Y. (2018) Consistent polynomial-time unseeded graph matching for Lipschitz graphons. arXiv preprint arXiv:1807.11027.

[ref61] 61. Zhang Y. (2018) Unseeded low-rank graph matching by transform-based unsupervised point registration. arXiv preprint arXiv:1807.04680.

[ref62] 62. Zhou F. & De la Torre F. (2012) Factorized graph matching. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 127–134. [Google Scholar]

[ref63] 63. Zhu M. & Ghodsi A. (2006) Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Statist. Data Anal., 51, 918–930. [Google Scholar]

PERMALINK

Matchability of heterogeneous networks pairs

Vince Lyzinski

Daniel L Sussman

Abstract

1. Introduction and background

1.1 Notation

1.2 Correlated heterogeneous Erd̋s-Rényi graphs

Definition 1.1

Definition 1.2

Remark 1.1

2. Graph matchability

Definition 2.1

Example 2.2

2.1 Centering to recover matching

Theorem 2.3

Remark 2.1

2.2 Approximate centering to almost recover matching

Algorithm 1 USVT for estimating —

Definition 2.4

Theorem 2.5

Remark 2.2

2.3 When to center?

Fig. 1.

2.4 Core matchability

Definition 2.6

Theorem 2.7

Definition 2.8

Theorem 2.9

3. Simulations and experiments

3.1 Simulation

Fig. 2.

Fig. 3.

3.2 Twitter data

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

3.3 Connectomes

Fig. 8.

Fig. 9.

Fig. 10.

4. Discussion

Acknowledgements

Funding

Appendix. A. Proofs

A.1 Proof of Theorem 2.3

Proposition A.1 (McDiarmid’s inequality). —

A.2 Proof of Theorem 2.5

Lemma A.1

Lemma A.2

A.3 Proof of Theorem 2.7

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases