Abstract
We consider the problem of graph matchability in non-identically distributed networks. In a general class of edge-independent networks, we demonstrate that graph matchability can be lost with high probability when matching the networks directly. We further demonstrate that under mild model assumptions, matchability is almost perfectly recovered by centering the networks using universal singular value thresholding before matching. These theoretical results are then demonstrated in both real and synthetic simulation settings. We also recover analogous core-matchability results in a very general core-junk network model, wherein some vertices do not correspond between the graph pair.
Keywords: graph matching, random graphs, singular value thresholding
1. Introduction and background
The graph matching problem (GMP) seeks to find an alignment between the vertex sets of two graphs that best preserves common structure across graphs. At its simplest, it can be formulated as follows: given two adjacency matrices
and
corresponding to
-vertex graphs, the GMP seeks to minimize
over permutation matrices
; i.e., the GMP seeks a relabelling of the vertices of
that minimizes the number of induced edge disagreements between
and
. Variants and extensions of this problem have been extensively studied in the literature, with applications across areas as diverse as biology and neuroscience [10,18,30,58], computer vision [20,35,45], pattern recognition [33,48,62] and social network analysis [25,34,59], among others. For a survey of many of the recent applications and approaches to the GMP, see the sequence of survey papers [12,19,23]. While recent results [3] have whittled away at the complexity of the related graph isomorphism problem—determining whether a permutation matrix
exists satisfying
—at its most general, where
and
are allowed to be weighted and directed, the GMP is known to be NP-hard. Indeed, in this case, the GMP is equivalent to the notoriously difficult quadratic assignment problem [7,8,36]. However, recent approaches that leverage efficient representation/learning methodologies (see, for example, [5,25,59]) have shown excellent empirical performance matching networks with up to millions of nodes.
In addition to algorithmic advancements in graph matching, there has been a flurry of activity studying the closely related problem of graph matchability; given a latent alignment between the vertex sets of two graphs, can graph matching uncover this alignment in the presence of shuffled vertex labels? This problem arises in a variety of contexts, from network de-anonymization and privatization to multi-network hypothesis testing [37] to multimodality graph embedding methodologies [10]. Many existing results are concerned with recovering a latent alignment present across random graph models, where each of
and
have identical marginal distributions, and exciting advancements on the threshold of matchable vs. unmatchable graphs have been made across many random graph settings, including the homogeneous correlated Erd̋s-Renyi model (see, for example, [4,17,39,44]), the correlated stochastic blockmodel setting (see, for example, [37,43]), the
-correlated heterogeneous Erd̋s-Renyi model (see, for example, [38,40]), and in the correlated heterogeneous Erd̋s–Rényi model with varying edge correlations (see, for example, [40,49]). In the non-identically distributed model setting, the work in [13–15] provide theoretic phase transitions on matchability in the
Erd̋s–Rényi (p,q,
) model (i.e.,
Erd̋s–Rényi (n,p),
Erd̋s–Rényi (n,q) and the edge correlation across graphs in provided by the constant
; see Definition 1.2).
The above results range from providing theoretic phase transitions on matchability [13,14,37] to providing nearly efficient methods for achieving matchability from an algorithmic perspective [4,15,17,21]. While they have served to establish a novel theoretical understanding of the matchability problem, in each case the transition from matchable to unmatchable graphs is defined in terms of decreasing across-graph correlation and within-graph sparsity. Importantly, it is not a function of fundamentally different probabilistic structures across the graphs to be matched. As we often witness in applications, the graph topologies can differ significantly even among vertices that correspond to the same entity across networks. Social networks offer a compelling example of this, where matching across different social network platforms requires the understanding that not all users will be behaving homogeneously across different network platforms [34]. Both theoretically (see Example 2.2) and practically (see, for example, [10]), this distributional heterogeneity can have a deleterious effect on graph matchability.
Herein, we propose one possible solution for ameliorating the effect of
and
not being identically distributed, namely via a universal singular value thresholding (USVT) [9] centering preprocessing step. Working in a general correlated edge-independent random graph model (see Definition 1.1), we theoretically demonstrate that USVT centering asymptotically almost surely recovers the matchability for all but a vanishing fraction of the nodes (see Theorem 2.5). In addition, we recover analogous result (see Theorem 2.9) in the setting in which only a fraction of the vertices possess a true correspondence across networks, generalizing and extending the results of [29,44,56].
This centering step is practically implementable on even very large networks and is demonstrated to have a significant positive impact on graph matchability in both real and synthetic data settings (see Section 3). While the results contained herein do not guarantee that any computationally efficient algorithm will be able to perfectly (or almost perfectly) align any given networks after USVT centering, they provide a theoretical foundation for subsequently studying algorithmic effectiveness. Indeed, they ensure that with high probability the optimal alignment according to the graph matching objective function is, essentially, the true latent vertex alignment, guaranteeing that subsequent optimization procedures are, at the least, seeking the right permutation.
1.1 Notation
The following notation will be used throughout the manuscript: for
, we will let
denote the hollow
matrix with all
’s on its off-diagonal,
will denote the
matrix of all
’s and
will denote the set
. We will consider
and
interchangeably as adjacency matrices and as the corresponding graphs consisting of vertices and edges. For a set
, we denote by
the induced subgraph of
on the vertices of
.
For a matrix
, the Froebenius norm of the matrix is defined as
![]() |
and the operator norm of
is denoted
![]() |
where
is the largest singular value of
. We denote the
of
via
![]() |
Below we will make use of the following trace form of the Frobenius norm:
; see [28] for more on the Frobenius norm and its many uses. For matrices
and
, we define
via
![]() |
where
is the
matrix of all
’s.
We will also make extensive use of modern asymptotic notation. To review, if
and
are non-negative functions of
, then we write
![]() |
1.2 Correlated heterogeneous Erd̋s-Rényi graphs
Formally the GMP we will consider is defined as follows.
Definition 1.1
Let
be the adjacency matrices of weighted, undirected graphs on
vertices. The GMP is to find an element of
(1.1) where
is the set of
permutation matrices.
Equation (1.1) follows here from
![]() |
We note here that, traditionally, the GMP formulated in Definition 1.1 is defined for unweighted graphs
and
. The extension we consider to weighted graphs is commonly used in the literature (see, for example, the work in [53]) and is useful for studying situations in which edges/vertices in the network have weight features attached to them. This added flexibility will be needed for subsequent theoretical developments and data applications.
In the presence of a latent vertex alignment,
, between the vertices of
and
, we wish to understand the extent to which graph matching
and
will recover
; i.e., if
is the permutation matrix corresponding to
, will
? In order to study this problem from a probabilistic perspective, we introduce a bivariate random graph model with a natural vertex alignment across graphs: the bivariate, correlated, heterogeneous, Erd̋s–Rényi random graph.
Definition 1.2
For
symmetric, matrices, we say
are instantiations of the
-correlated heterogeneous Erd̋s–Rényi random graph model with parameters
(abbreviated as
) if
ER
; i.e.,
is an independent edge random graph with no self-loops satisfying
for each
![]()
ER
; i.e.,
is an independent edge random graph with no self-loops satisfying
for each
;
Edges across networks are collectively independent except that for each, the correlation between
and
is
Before proceeding further, we will make a few remarks on the
random graph model. In the homogeneous ER
model, network growth as
is natural, and we can consider an asymptotic regime in which
depends on
. Here, we similarly consider
and
to be dependent on
, but make no further assumptions on expressly how the dependence on
is manifest. This allows for us to consider classical homogeneous Erd̋s–Rényi, stochastic blockmodels [27], random dot product graphs (conditioned on the latent positions) [57], etc., as subfamilies of our random graph model.
In addition, by allowing
and
to differ, this model allows for a latent correspondence to exist in settings where the underlying topology and degree structure of the graphs to be matched differs significantly. This distributional heterogeneity is often observed in real data settings (see, for example, the connectomes being aligned in [10] and the social networks aligned in [34]), and we seek to understand the limitations of graph matching approaches when attempting to overcome this heterogeneity. Note also that when
, there are restrictions on feasible correlations
: Indeed, if
Bernoulli
and
Bernoulli
are
-correlated with
, then the correlation must satisfy
.
Lastly, this model naturally allows us to consider a partition of
into core (
) and junk (
) vertices
; core vertices are those that have a corresponding vertex (i.e., true match) across networks, while junk vertices do not. If we consider
with
of the form
where
, then it is reasonable to define
and
. For all
it would then hold that
for all
, and
and
are independent random variables. A natural question to ask is when an optimal GM algorithm will correctly align the vertices in
across networks. This problem was studied in the context of homogeneous ER networks with constant correlation in [29], and the results in Section 2.4 generalize and extend those in [29] to this more adaptable network model.
Remark 1.1
In what follows,
and
are not necessarily assumed to be hollow matrices, and we allow for non-zero entries on the diagonals of the
. This is done to simplify eigen-decompositions in our proof methods. We do assume our graphs
and
are loop-free and have no self edges. As such,
(resp.,
) are necessarily hollow and do not necessarily equal
(resp.,
). We do have that for
,
(resp.,
), but this will not hold when
if
.
2. Graph matchability
In the
setting, we seek to understand when a graph matching procedure could correctly align the vertices across networks; i.e., if
where
denotes the identity matrix. More generally, if
is a dissimilarity (i.e., if it is a symmetric, non-negative function with
for all
; see, for example, [51]), when is it the case that
![]() |
In this more general framework, we consider the following definition of graph matchability.
Definition 2.1
Let
be a dissimilarity. We will say that
are
-matchable if
where
denotes the identity matrix.
By considering an appropriate
in Definition 2.1, we can fit the classical GMP in the formulation; indeed, the GMP of Definition 1.1 considers the dissimilarity defined via
![]() |
In this paper, we will consider, more generally, dissimilarity functions of the form
![]() |
for suitably defined matrix-valued function
![]() |
Special cases of interest in our present
setting are
![]() |
(2.1) |
![]() |
(2.2) |
where
(resp.,
) is a suitable estimate of
(resp.,
) derived from
(resp.,
). In addition to the notion of
-matchability for a dissimilarity
, we will also define the notion of oracle-matchability. We will say that
and
distributed
are
-matchable if
![]() |
In the sequel, oracle-matchability will provide a useful theoretic bridge between
-matchability and
-matchability. Note that we will write
-matchability and
-matchability for the notions defined in Eqs. (2.1) and (2.2), respectively.
A natural question to ask is why we define the GMP in terms of
and not in terms of a more general dissimilarity
; indeed, alternate dissimilarities
have been considered in the definition of the GMP in the graph matching literature (see, for example, [60,61]). Moreover, we consider the GMP objective function formulation in Definition 1.1 even though in numerous settings the optimal solution to this GMP may not be a given latent vertex alignment, and in this section, we will see instances of when
are, with high probability, not
-matchable. Our choice of
in the GMP is motivated by two main factors. First, this is the classical definition of graph matching and ties our current work to a vast graph matching literature. Secondly, we seek to understand conditions for when the original
-matchability fails, yet there is a suitable dissimilarity
for which
-matchability is achieved. As the formulation in Definition 1.1 is commonly used in practice, this could provide practical guidance for when vertex labels can be recovered via a different objective function viewpoint.
In recent work addressing the question of
-matchability, results have been established for the
,
setting (see, for example, [4,17,39,44]), in the correlated stochastic blockmodel setting (see, for example, [37,43]), in the correlated heterogeneous Erd̋s–Rényi model (see, for example [38,40]), and in the general
and general
setting (see, for example, [40,49]). In the non-identically distributed model setting, the work in [13–15] considers
,
and
. In each setting, the results showed that for sufficiently dense, sufficiently correlated graphs,
-matchability is almost surely achieved. Converse results in [13,14,37,39] show that in the sufficiently sparse and/or weakly correlated setting,
-matchability is a.s. lost (i.e., a.s. the solution to the GMP is not the latent alignment). The work in [13,14] deserves special mention, as the converse results therein are proven for general
-matchability; i.e., for sufficiently sparse and/or weakly correlated networks,
-matchability is a.s. lost for all dissimilarities
.
In these examples, it is sparsity and/or weak dependence that is potentially thwarting the matching in each instance and not the heterogeneity of the model itself. As the next straightforward but illustrative example demonstrates, the degree and structural heterogeneity across networks allowed for in the
model makes the question of
-matchability a bit more nuanced.
Example 2.2
Consider the following correlated heterogeneous stochastic blockmodel example. Let
be distinct, and define
Let
be the vertices in block 1 in
and
, and let
be the vertices in block 2. Assuming
and, letting
and
we consider
of the form
Unlike in the cases where the loss of
-matchability is due to network sparsity and/or weak correlation, in this example the non-identically distributed nature of
and
can obfuscate the true alignment from a graph matching perspective. Indeed, for many choices of the parameters above, the optimal permutation for the GMP in Definition 1.1 will not be the latent correspondence, and permuting blocks 1 and 2 will, with high probability, yield a better GMP objective function.
To wit, let
be any permutation such that
(so that
aligns block 1 in
to block
in
and vice versa) with corresponding permutation
. The number of edges
such that
is bounded above by
and
Therefore,
Combined, we see that the difference in the objective function for
as compared to
is
Numerous choices of the parameters in this model (for example,
,
,
,
,
) yields that
for a positive constant
(in the example,
). As
is highly concentrated about its expectation (see Appendix A.1), there is high probability that
, and
i.e.,
and
would not be
-matchable.
2.1 Centering to recover matching
In the previous example, we see that
can effectively make
and
not
-matchable. One way to recover the latent alignment in this heterogeneous setting is to transform the problem back into the homogeneous case, and rather than matching
and
, we would match
and
; yielding once again
. As the next theorem demonstrates, this is sufficient to a.s. recover
-matchability under mild model conditions. Before stating the theorem, we first must define some additional notation. For
, and permutation
, define the matrix
via
![]() |
(2.3) |
For each
in
, define
![]() |
(2.4) |
Theorem 2.3
Let
and consider
and
. For each
, define
If for all
and all
, we have
then
The proof of Theorem 2.3 relies on a now standard application of McDiarmid’s inequality, and is similar to the proofs of analogous matchability results in [37–39]; details of the proof can be found in Appendix A.1.
Remark 2.1
The growth condition on
in Theorem 2.3,namely
, is attempting to capture the necessary degree to which the entry-wise covariance matrix
needs to be asymmetric. If we define
, then from Eqs. (A.4) and (A.3) we that for
,
and if
then
. Constraining
globally and not entry-wise allows for more flexibility in applying the theorem to settings where some of the edges are very sparse or weakly correlated.
Consider the growth condition on
, namely
, in the
,
homogeneous ER setting (wlog, assume
). In this setting, as
,
and
are
-matchable iff
and
are
-matchable. In the sparse setting of [13,Theorem 1],
-matchability is achieved with high probability when all of the following hold:
![]() |
(2.5) |
![]() |
(2.6) |
![]() |
(2.7) |
![]() |
(2.8) |
Note that these conditions cannot simultaneously hold when
![]() |
Indeed if
, Eq. (2.6) implies
, and hence
. Therefore,
![]() |
contradicting Eq. (2.7). In the
setting, modulo the sparsity conditions, there is a
-matchability phase transition at
, and the corresponding rate achieved in Theorem 2.3, namely
, is above this phase transition threshold. We view this as the price paid in the theorem for being able to handle both heterogeneous and homogeneous ER settings.
In this setting, the growth condition of Theorem 2.3,
, can hold in the dense setting
as well as when
In the dense setting of
,
-matchability transitions at
[37,39], which our theorem recovers (asymptotically).
In [13], the authors establish a phase transition at
, providing the corresponding converse result that ensures no
-matchability if
for all dissimilarities
. We do not derive a corresponding converse result to Theorem 2.3 herein (namely, a condition on
that ensures
and
are not
-matchable for any suitable
), as we are focused on how to practically recover
-matchability in the non-identically distributed setting; see Theorem 2.5.
2.2 Approximate centering to almost recover matching
Unfortunately, centering
and
by the true edge probability matrices
and
is impractical, as these model parameters are unknown in practice. Our solution to this hurdle is to estimate the unknown
and
via USVT [9], and then approximately center the networks via these estimates.
Our method for estimating
and
is based on the USVT method of [9], and USVT applied in the present setting is outlined in Algorithm 1.
Algorithm 1 USVT for estimating
—
Input: Adjacency matrix
, threshold
;
1. Let
be the singular value decomposition of
, with singular values ordered via
;
2. Let
be the set of singular values greater than the threshold
;
3. Define
Output:defined via
for all
, and for
![]()
If we estimate
via
and
via
using USVT, can we recover
-matchability using the approximately centered matrices,
and
, for a suitable
? Given the error introduced in estimating the edge probability matrices, the answer is unsurprisingly `no', at least for the proof techniques we employ herein. However, if we slightly weaken Definition 2.1 to allow for a vanishing fraction of unmatched nodes, then we can recover an analogous result to Theorem 2.3. This motivates the following definition:
Definition 2.4
Let
be a dissimilarity. Consider random graphs
. We say that
and
are
-matchable if
Unwrapping Definition 2.4, we see that
-matchability is equivalent to any optimal permutation (under dissimilarity
) correctly recovering the labels of at least
vertices across
and
. As the following theorem indicates, under mild model assumptions,
and
are
-matchable asymptotically almost surely, where the estimates
and
in Eq. (2.2) are the USVT estimates. The proof of Theorem 2.5 can be found in Appendix A.2.
Theorem 2.5
Let
and further assume that for each
,
(i) There exists
such that
entry-wise. Note that for each
,
is fixed, though we allow
to vary in
.
(ii) We have that
.
(iii)
is approximately low rank in that there exists a
such that
where
are the singular values of
.
If for all
and all
, we have that there exists
such that
and for each
, there exists constants
such that if
is the USVT estimate of
with threshold level
and if
then
and
satisfy
Let us take a moment to explore the assumptions in Theorem 2.5. Assumptions (i) and (ii) control the allowable sparsity of the networks, ensuring that the minimum expected degree grows asymptotically faster than
. If the mean expected degree was
, then the graphs would be a.s. disconnected [6], and our proof techniques fail as
and
would no longer concentrate about
and
with high probability [32]. The rank assumption in (iii) is needed to control the accuracy of the USVT estimates of the unknown
’s. Practically, smaller
allow us to use suitable low-rank estimates
of
that are computationally easier to implement; this is indeed the case in many common random graph models such as the Stochastic blockmodel [27] (where often
), random dot product graphs [57] and latent position random graphs [26] (where
is often taken to be
[52]), among others.
If
is bounded away from
entry-wise, and each entry of
is
(which is indeed the case in the oft adopted setting, where each
for a matrix
with entries of order
), then
as defined in Remark 2.1 satisfies
. We then have
From (Eq. A.10) in the proof of Theorem 2.5, we see that
-matchability is achieved here for
![]() |
If, in addition
and each
, then up to a logarithmic factor
and
are
-matchable, and an oracle graph matching algorithm would properly align all but potentially a vanishing fraction of the nodes across the graphs.
Remark 2.2
In Theorem 2.5,we estimate
via
using USVT with threshold
. In application, often suitable estimates of
can be obtained with rank
of order
or
[52], especially in the setting of latent space graph models. For the purposes of our proof approach, suitably good means that
(similarly for
). We do not explore this model selection question further here (i.e., estimating a suitable rank rather than a threshold for our USVT estimates), as in applications often only a relatively small number of singular values are above the USVT threshold.
2.3 When to center?
We have seen above that in the setting where both
and
are not
-matchable, centering
and
via
and
can recover
-matchability by ameliorating the effect of the differing
’s. Moreover, approximately centering by
and
theoretically recovers
-matchability for all but a vanishing fraction of the vertices. A natural question is in the case when
, does Theorem 2.5 imply that a.s. perfect
-matchability is potentially lost when USVT centering is performed unnecessarily?
Consider the following simple example, where
and
,
, and we vary
and
. In this example, there is no need to center before matching, and the variability introduced by estimating the
’s could potentially cause
![]() |
Fortunately, at least in this example we see this is not the case (see Fig. 1). As matching these graphs exactly (i.e., finding the argmin of the GMP) is computationally challenging, we use as a surrogate for
-matchability and
-matchability whether the true alignment is a local (rather than global) minima of the GMP before and after centering. To test this, we match the graph pairs (USVT centered and uncentered) using the constrained gradient-ascent based graph matching algorithm, FAQ [54], initialized at the true correspondence
. While FAQ is not guaranteed to terminate at a local minima, if it terminates at
, then that is evidence in support of
’s local optimality. Moreover, if FAQ does not terminate at
, then
is not a local minima of the GMP.
Fig. 1.

We plot the mean (
1 s.d.) of FAQ
FAQ
over
and
. The parameters considered are
and
.
Letting FAQ
(resp., FAQ
) denote the number of vertices correctly matched by FAQ initiated at
when
and
are matched directly (resp., when
and
are USVT centered before they are matched). In Fig. 1, we plot the mean (
1 s.d.) of FAQ
FAQ
over
and
. From the figure, we see that there is no significant performance lost by centering the graphs first. Indeed, in highly structured/low-rank settings (e.g., homogeneous ER or SBM), we can obtain high-fidelity estimates of the individual entries of the
vs. the global estimates used in current proof. These local estimates (which can also be obtained by non-spectral methods, e.g., in the
case we can use
) should allow for significantly less error to be introduced in the estimation of the
’s and will allow for little-to-no theoretic degradation due to centering. As we are more focused on the general
case, we do not pursue this further here.
2.4 Core matchability
Often in applications, only a fraction of the vertices in
possess a latent matched pair in
. We will denote those vertices that have a latent match across graphs as the core vertices (denoted
), and we will denote those vertices that do not have a latent match across graphs as the junk vertices (denoted
). In this section, we seek to further understand the ability of an oracle graph matching procedure to correctly match the cores across graphs. This motivates the following definition of core-matchability.
Definition 2.6
Let
, and consider a partition of the vertex sets into core and junk vertices,
Define
to be the set of core matching permutations. For dissimilarity
, we say that
and
are core
-matchable if
i.e., if any optimal permutation aligning
and
under
perfectly matches the cores across networks.
If we consider
with
of the form
where
, then it is reasonable to define
and
. Indeed, under these model assumptions
if either
or
is in
.
Completely analogously to the setting considered in Example 2.2, it is immediate that
and
need not be asymptotically almost surely core
-matchable even with non-vanishing core correlation in
. Indeed, as in Example 2.2,
and
can be chosen to effectively obfuscate the true alignment among the core vertices. Mimicking the results of Theorem 2.3, centering
and
again a.s. recovers core
-matchability of
and
under mild model assumptions. The proof of Theorem 2.7 is contained in Appendix A.3.
Theorem 2.7
Let
and consider
and
. Suppose that
is of the form
where
, and for each
, let
be defined as in Eq. (2.4). If for all
we have that
(2.9)
(2.10) and also if
, then
As before, if we define
then for
,
. If, in addition,
![]() |
then Eqs. (2.9) and (2.10) hold and core
-matchability is recovered. In the event that
is
or
then Theorem 2.7 implies that
and
are core
-matchable in the presence of nearly linear junk, with arbitrary junk structure. This result extends and generalizes the results in [29] to the non-homogeneous ER setting.
As before, if the unknown
and
are estimated via USVT, then we recover partial core matchability. Before formalizing this, we first need the following extension of Definition 2.4 to the core-junk setting.
Definition 2.8
Let
be a dissimilarity. Let
, and consider a partition of the vertex sets into core and junk vertices,
Define
We say that
and
are core
-matchable if
The following theorem provides the analogue of Theorem 2.5 in the core-junk setting. Note that the proof is completely analogous to that in Theorem 2.5, and so is omitted.
Theorem 2.9
Let
and consider
and
. Suppose that
is of the form
where
. With the assumptions on
and
from Theorem 2.5,and assume for each
,
(2.11)
(2.12) and also assume
. For each
, there exists constants
such that if
is the USVT estimate of
with threshold level
and if
then
3. Simulations and experiments
In the following sections, we explore the impact on graph matchability of USVT centering in both simulated and real data settings. We note here that precisely determining the level of
,
and
-matchability is infeasible for even modestly sized networks, as this would require exactly solving the NP-hard GMP. To circumvent this, we instead match our networks using the FAQ algorithm of [54] initialized at a variety of starting points including the true correspondence. As it is a Frank–Wolfe [24] based algorithm, if FAQ terminates at the true correspondence (or at a permutation which matches a high percentage of the vertices), then the true correspondence is an estimated local minima of the GMP. Moreover, if FAQ is initialized at the true correspondence and does not terminate at the true correspondence, then the true correspondence is not a local minima. Comparing objective function values across estimated local minima then allows us to approximately gauge the global optimality of the true correspondence. While this is not the same as directly finding the global minima desired in the definition of matchability, it nonetheless provides a useful, principled heuristic for empirically studying both matchability and deviations there from.
3.1 Simulation
To explore the utility of USVT centering as a graph matching preprocessing step, we consider the following experiment. We let
with
![]() |
and
of the form
![]() |
and we use the FAQ algorithm of [54] to match (i)
and
directly (labelled ‘Uncentered’ in Figs 2– 3); (ii)
and
(labelled ‘Centered’ in Figs 2– 3); and
and
(labelled ‘Approx. Centered’ in Figs 2– 3). In each figure, we initialize the FAQ algorithm at
—i.e., at the true latent alignment—and at
—i.e., at the alignment completely confusing blocks one and two across networks. We plot the mean fraction of vertices matched correctly (
s.d.) by FAQ at the starting point that achieves the lowest graph matching objective function score (averaged over
Monte Carlo replicates). As mentioned above, if the fraction matched correctly is less than
, then the true alignment is not a local minimum of the graph matching objective function and the graphs are not
,
, or
-matchable (depending on what input FAQ is matching).
Fig. 2.

Fraction correctly matched by FAQ
s.d. (optimized over two different initializations:
and at
) vs.
when matching (i)
and
directly (labelled ‘Uncentered’); (ii)
and
(labelled ‘Centered’); and
and
(labelled ‘Approx. Centered’). Here,
captures the level of correlation between
and
(higher
means more correlation),
is fixed and results are averaged over
Monte Carlo replicates.
Fig. 3.

Fraction correctly matched by FAQ
s.d. (optimized over two different initializations—
and at
) vs.
when matching (i)
and
directly (labelled ‘Uncentered’); (ii)
and
(labelled ‘Centered’); and
and
(labelled ‘Approx. Centered’). Here,
is fixed and results are averaged over
Monte Carlo replicates.
In Fig. 2, we consider
—i.e., the graphs are size
—and in the USVT estimates
we used
as suggested in [9]. We plot
vs. the the mean fraction of vertices matched correctly (
s.d.) It is unsurprising in light of Example 2.2, that
and
are not only directly not
-matchable, but that the alignment found by FAQ matches none of the vertices correctly across graphs. Also of note in the figure is that as
increases, the oracle centered graphs
and
appear to be nearly
-matchable (in that the estimated local minimum found by FAQ is close to
). The steep performance drop off as
decreases is a consequence of the fact that in low-correlation regimes (low for a given
),
-matchability is often not recovered even through centering. Performance in the approximately centered case tracks performance in the centered case, with the USVT centering recovering the gains of the oracle centering. This empirically suggests that the
lower bound in Theorem 2.5 is not sharp, as USVT centering recovers full
-matchability in the high
regime. We surmise that if
is truly low-rank, USVT centering and true centering will recover perfect
- and
-matchability respectively as
increases.
In Fig. 3 we repeat the above simulation with
fixed and
. Using
as the USVT threshold, we again plot the mean fraction of vertices matched correctly (
s.d.) vs.
. As before, without centering the optimal alignment found by FAQ matches very few of the vertices correctly across graphs. The performance increase in the oracle centered setting as
increases is a consequence of the
-matchability (in Theorem 2.3) being an asymptotic result; indeed, we should not expect correlated small networks to be almost surely
-matchable even with the oracle centering. We note, however, that
(i.e., graph order
) here is sufficient for the asymptotically perfect
-matchability to be recovered. Again we see that performance in the approximately centered case tracks performance in the centered case, with the USVT centering achieving almost all of the gains of the oracle centering.
3.2 Twitter data
In order to analyse the impact of USVT centering with real data, we considered two graphs derived from Twitter.1 The two graphs are based on the most active twitter users from April and May 2014. The graphs are unweighted with an edge between users if a user mentioned another user during the given month. After keeping only the largest common connected component, the number of users was 431 in each graph.
As can be seen in Fig. 4, many vertices in this data set have very similar connectivity patterns. Indeed, the empirical Pearson correlation between the entries in the two adjacency matrices is
. It is not surprising that the similar graph topology across networks leads to good performance when matching the graphs without centering. Repeating the experiment in Section 3.1—i.e., matching the adjacency matrices
(‘April’) and
(‘May’) using FAQ initialized at
—yields
vertices correctly aligned across networks. Although the true correspondence is not optimal (according to the GM objective function), the estimated local optimal correspondence does match
of the vertices correctly across networks without the need for centering. It is worth noting that preprocessing the data via USVT centering before matching again yields
vertices correctly matched by FAQ initialized at
(with a suitable USVT threshold here being
). This suggests that the centering procedure does not hurt performance when the graph topologies are similar across networks, and as we will demonstrate below, can significantly increase performance when the graph topologies differ across networks. We do note here that we do not zero out the diagonal of
in the USVT step in this real data example, as here, hollow
’s led to significantly worse performance than the non-hollow
’s.
Fig. 4.

In the left two panels, we plot the adjacency matrices of aligned Twitter graphs from April and May. In the right panel, we plot the degrees of each vertex in April vs. the degrees of the same vertex in May. The vertices were both sorted according to ascending degree in the April graph.
While both the centered and uncentered graphs are highly
- and
-matchable, respectively, centering does have a very interesting algorithmic effect here; see Fig. 5. In the figure, we plot the number of vertices in the April–May Twitter graphs correctly matched by FAQ vs. the graph matching objective function value. In each panel (on the left matching
and
, and on the right
and
), we initialize FAQ at 100 different starting points: once at
(labelled ‘P0 = I’ in the legend) and the rest at random permutation restarts (labelled ‘rand. start’ in the legend). The figure suggests that centering has the effect of creating a more stable objective function gap between the estimated optimal permutation and suboptimal alternatives. In a setting where multiple random restarts are possible—and needed—to recover an unknown latent alignment, this suggests that the optimal alignment is perhaps more easily recognized in the centered graph regime, and hence online stopping criterion more easily implementable.
Fig. 5.

We plot the number of vertices in the April–May Twitter graphs correctly matched by FAQ vs. the graph matching objective function value. In each panel (on the left matching
and
, and on the right
and
), we initialize FAQ at 100 different starting points: once at
(labelled ‘P0 = I’ in the legend) and the rest at random restarts (labelled ‘rand. start’ in the legend).
To explore the performance of the two approaches (centering and not centering) in the setting of different network topologies, we consider the following synthetic data experiment. We choose
random users from the twitter networks and add
to their induced subgraphs in the May network, where
ER
(followed by again binarizing
); i.e., if the set of randomly chosen vertices is
, then
and the subsequent
is binarized before matching or centering. We consider
. This experiment is simulating the setting where a fraction of the network changes its behaviour from April to May; in this case by increasing their volume of mentions month to month. For each value of
, we repeat this experiment
times and plot the mean accuracy (
1 s.d.) of graph matching using FAQ initialized at
both with and without USVT centering; see Fig. 6. This experiment demonstrates the capacity for USVT to maintain
-matchability in the face of additive deviations in the network structure. These deviations have the effect of altering the graph topology month-to-month, and with enough signal, they have a precipitously negative impact on the performance of matching sans centering. Centering ameliorates this effect, and emphasizes the common signal in the networks by removing the effect of this additive noise. It is interesting to note that for small values of
, centering negatively impacts algorithmic performance. We view this as potentially an artifact of the noise in these settings not being sufficient to obfuscate the true matching without centering.
Fig. 6.

In the left panel, we plot the average matching accuracy (
s.d.) of graph matching using FAQ initialized at
when first choosing
random vertices, denoted
from graph
and then substituting
(where
is again binarized after noise is added) before matching and centering; here
. Accuracy is plotted vs.
In the right two panels, we plot the degrees of each vertex in April vs. the degrees of the same vertex in May (with the
substitution).
In the core-junk setting, the heterogeneity among the junk vertices offers a further setting for demonstrating the utility of USVT-centered graph matching. To see this, we consider the following experiment: choose
uniformly sampled core vertices from the twitter network and
uniformly sampled junk vertices,
, for the April graph and
uniformly sampled junk vertices,
, for the May graph. As before, we match
and
using FAQ initialized at
both with and without USVT centering; results are summarized in Fig. 7. As seen previously, the ability of USVT-centering to ameliorate the degree/distributional heterogeneity (here among the junk vertices) leads to superior core label recovery compared to the uncentered matching setting.
Fig. 7.

We plot the average core matching accuracy (
s.d.) for
and
using FAQ initialized at
(with
) against
. Results are averaged over
Monte Carlo iterates.
3.3 Connectomes
For our next example, we consider the diffusion MRI data from [31]. The dataset consists of test–retest pairs (used to evaluate reproducibility of the magnetization prepared rapid acquisition gradient echo image protocol). Each scan is converted into a weighted connectome by considering 70 brain regions of interest (labelled according to the Desikan brain atlas [16]) as the vertices, with edge weights counting the number of neural fiber bundles connecting the regions. See [46,47] for more detail on how these graphs were constructed. As vertices correspond to canonical brain regions of interest, it is natural to consider the true correspondence across graphs as being given by the identity mapping.
To illustrate the role of USVT centering in this data set, we first consider as an example a pair of graphs generated as above from the data in [31]. The respective adjacency matrices for this graph pair are shown in Fig. 8. Matching these brains directly using FAQ initialized at
yields an estimated local optimum with 65 vertices correctly aligned across graphs; indeed, by permuting vertices
, we obtain a better objective function value than the GMP evaluate at
. We seek to understand the ability of USVT centering, which is global in nature, to correct these local mismatches.
Fig. 8.

Adjacency matrices of two sample brains from the dataset in [31].
To study this further, we apply a variant of the USVT procedure in which we automatically select the number of singular values to threshold by combining the ideas of USVT with the profile likelihood work of [63]; to wit, we select the threshold dimension via an elbow analysis of the SCREE plot of the singular values. We chose this automated procedure rather than setting a singular value threshold because these graphs are weighted, and the common threshold of
from [9,55] is presented for the unweighted setting. Centering the pair of graphs from Fig. 8 recovers the identity
as an estimated local minima of the GMP, and the global centering corrects the localized mismatch.
Extending this to a 41-scan sample from [31] (hence, we consider 41 graphs each with 70 vertices), we run FAQ initialized at
for the
pairs of distinct graphs with both USVT centering and no centering. When matching graph
and graph
for
, we let
![]() |
In Fig. 9, we plot a heatmap of the
differences
, so that the
th entry in the heatmap corresponds to the excess number of correct matches achieved by USVT centering. Red values indicate more correctly matched via centering and blue values indicate more correctly matched via no centering. The colour intensity indicates the value of
achieved, with darker colours indicating more (in the red case) or less (in the blue case) vertices correctly matched after centering. The figure demonstrates that the phenomena observed in the graphs in Fig. 8 was not an anomaly. Only two pairs see an improvement in matching accuracy when not centering, while
pairs see an improvement in matching accuracy when USVT centering. Moreover, while many of the mismatches are local in nature, they are nonetheless ameliorated by the global USVT centering procedure.
Fig. 9.

For the
weighted brain pairs, we plot a heatmap of the differences
, so that the
th entry in the heatmap corresponds to the excess number of correct matches achieved by USVT centering. Red values indicate more correctly matched via centering and blue values indicate more correctly matched via no centering. The colour intensity indicates the value of
achieved, with darker colours indicating more (in the red case) or less (in the blue case) vertices correctly matched after centering.
If we consider running the same experiment on the unweighted brain graphs (using USVT centering with threshold
), we see the delicate nature of the USVT threshold in data applications (see Figure 10). We note here that, again, in this unweighted case, we do not zero out the diagonal of
in the USVT step in this real data example. When
,
pairs achieved improved matching performance when USVT centering first, and
pairs achieved improved matching performance when not centering. When
,
pairs achieved improved matching performance when USVT centering first, and
pairs achieved improved matching performance when not centering. These results suggest two important take-aways: First, performance is intimately tied to properly thresholding; and second, in this example, USVT-centering is more effective in the weighted edge case. This suggests both that the magnitude of weights are contributing significantly to the mismatch and that USVT centering is effective at ameliorating this edge weight heterogeneity; this is not entirely unexpected as the centering is precisely trying to eliminate the different edge probability/weight structures across the
.
Fig. 10.

For the
unweighted brain pairs, we plot a heatmap of the differences
, so that the
th entry in the heatmap corresponds to the excess number of correct matches achieved by USVT centering. Red values indicate more correctly matched via centering and blue values indicate more correctly matched via no centering. The colour intensity indicates the value of
achieved, with darker colours indicating more (in the red case) or less (in the blue case) vertices correctly matched after centering. In the left panel, we center by USVT with
; in the right panel by
.
4. Discussion
Understanding the limits of
-matchability is an essential step in robust multiple graph inference regimes. When graphs are not
-matchable—i.e., the true node correspondence cannot be recovered in the face of noise—paired graph inference methodologies that utilize the across graph correspondence (see, for example, [2,50]) cannot gainfully be employed. Non-
-matchability can limit analysis to methods which rely on graph statistics which are invariant to relabelling of the vertices, which can be useful, but lack the full power of their parametric (with the labelling as a parameter) counterparts. In this paper, we establish initial theoretical results on
-matchability when the graphs to be matched differ in distribution, and when only a fraction of the graphs are matchable.
While our theoretical results and subsequent simulations and experiments provide a basis for a deeper understanding of the effect that distributional heterogeneity has on
-matchability, there is still much to be done. For example, in our present USVT centering step, the across graph correlation provided by
is not utilized. We suspect that the error in the USVT steps could be greatly reduced by leveraging
or an estimate thereof. We are also exploring graph normalization strategies other than centering, such as kernel smoothing (using a small number of a priori known correspondences to choose the proper smoothing kernels), which may be more appropriate in the presence of multiplicative or other nonlinear noise structures. We suspect that the growth rate of
in Theorem 2.5 is not sharp, as in simulation and real data settings the true correspondence is almost perfectly recovered via USVT centering before matching; however, sharpening the lower bound on
with our present methods does not seem feasible and new ideas and techniques need be employed.
We suspect that the bounds on
vs.
obtained in Theorem 2.9 are also suboptimal. As an informal argument we can consider known results for the quadratic assignment problem for i.i.d. entries [8]. In the uncorrelated dense homogeneous Erd̋s–Rényi setting, these results imply that the best solution to the GMP, Eq. (1.1) will reduce the objective function
as compared to a random guess. On the other hand, in the dense homogeneous Erd̋s–Rényi setting with constant correlation, the best solution is
better than a random guess. Under the heuristic that in order for the core vertices to be matched correctly, the signal from correlation must be greater than the possible improvements in the all noise setting, we can conjecture that a core matchability threshold at approximately
may be possible. Using our present proof technique, we are unable to achieve this rate. This heuristic argument, though problematic, provides a potential guidepost for future work.
In addition, while the theoretical results presented herein are for the case that the graphs are simple undirected graph with no edge-weights, the graph matching framework of Definition 1.1 is flexible, allowing us to accommodate many of the features—both those considered above, and additional eccentricities—inherent to real data settings. In the weighted, loopy setting, the matching can occur between the weighted adjacency matrices or normalized Laplacian matrices,
(where
is the diagonal matrix with
th entry equal to
) and the similarly defined
. To match directed graphs, the graphs can either be made undirected (for example, by matching
to
) or the directed adjacency matrices can be directly plugged into Eq. (1.1). Developing similar results to Theorems 2.5 and 2.9 in models (akin to our CorrER model) that incorporate these graph features, as well as additional vertex and edge features, is a natural next step.
The information and computational limits for
-matchability are still open problems for which we have pushed the boundaries, but significant more work is to be done. These problems are in analogy to the recently addressed problems of detection and recovery for the planted partition and planted clique problems for a single graph [1,22,42]. For these settings, exact fundamental limits have been established and polynomial time algorithms have been shown to achieve or nearly achieve these limits. Obtaining similar results for the GMP are key steps towards a robust statistical framework for multiple graph inference.
Acknowledgements
This material is based on research sponsored by the Air Force Research Laboratory and Defense Advanced Research Products Agency (DARPA) under agreement number FA8750-18-2-0035. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory and DARPA or the US Government. We would also like to thank Prof. Carey E. Priebe, Prof. Minh Tang and Joshua Cape for their helpful discussions in the writing of this manuscript.
Funding
MIT Lincoln Labs and the Department of Defense (for Dr. Sussman); NIH (for Dr. Lyzinski, via grant BRAIN U01-NS108637); Air Force Research Laboratory and DARPA under agreement number FA8750-18-2-0035.
Appendix. A. Proofs
Herein, we collect proofs of the main theoretical results in this manuscript. Before stating our proofs, we first state some well-known facts about the bivariate Bernoulli distribution. Indeed, if
model, then for each
,
and
can be realized as a bivariate Bernoulli random variable. This will be a key insight in the proof of our main results, Theorems 2.3 and 2.5.
Spelling this out further, a pair of Bernoulli random variables
has a BiBernoulli distribution with
![]() |
(A.1) |
if
for each
. A key property of BiBernoulli random variables is that they can be generated by a triple of independent Bernoulli random variables. For
as above, setting
),
, and
, with
and
independent yields
![]() |
In the
model, we then have that
![]() |
where
![]() |
(A.2) |
are independent Bernoulli random variables.
A.1 Proof of Theorem 2.3
The key to the Proof of Theorem 2.3 is the well-known McDiarmid’s inequality [41].
Proposition A.1 (McDiarmid’s inequality). —
Let
be a sequence of independent random variables. Let
be such that for all
, and for all
![]()
If
, then for any
,
□
Proof of Theorem 2.3. Let
, and consider
and
.
If
and
are
-correlated Bernoulli random variables with respective parameters
and
, then it follows that
![]() |
Let
be a permutation matrix that permutes exactly
labels and let
be the associated permutation for
. Note that
![]() |
satisfies
![]() |
where
![]() |
is the number of transpositions induced by
. Note that
, and so
![]() |
(A.3) |
For each
, define the matrix
as in Eq. 2.3. We have then
![]() |
(A.4) |
For ease of notation, we define
![]() |
so that
![]() |
As
, and each
pair is a function of three independent Bernoulli random variables (the
from Eq. (A.2) that are independent by construction), we have that
is a function of
![]() |
and so is a function of
independent Bernoulli random variables, where
then satisfies
![]() |
Changing the value of one of these Bernoulli random variables leaving all others fixed can change the value of at most one
pair, and this pair appears in two terms in the sum
. As each term in the sum of
is bounded in
, we have that, in the notation of Proposition A.1, each
can be uniformly set to
. Proposition A.1 then yields (setting
)
![]() |
(A.5) |
with
being an appropriate positive constant that may change line to line. If
then
![]() |
(A.6) |
To finish the proof, we apply a union bound on all such
. The number of such permutations
that permute
vertex labels is upper bounded by
. Combining with Eq. (A.6), we have that under the assumptions of the theorem
![]() |
as desired.□
A.2 Proof of Theorem 2.5
Key to the proof of Theorem 2.5 are the following lemmas, adapted here from [55,Lemmas 1 and 2].
Lemma A.1
Let
. Suppose
for
fixed. Let
be the singular value decomposition of
, and let
Then
where
are the singular values of
.
Lemma A.2
Let
ER(
); i.e.,
is a hollow, symmetric matrix with
. Assume
for some
. If
for a constant
, then for all
there exists a constant
such that
Note first that
![]() |
with the analogous result holding for
. Under the assumptions of Theorem 2.5, Lemma A.2 implies that there exists constants
such that
![]() |
with probability at least
, and therefore there exists constants
such that
![]() |
with probability at least
.
Next, we apply Lemma A.1 with
,
(resp.,
,
). With probability at least
, there exists constants
such that if
then (where
is as defined in the USVT pseudocode, Algorithm 1)
![]() |
where the equality follows from the rank assumption on
in Theorem 2.5; similarly, for
we have
![]() |
Combining the above, we have that there exists an event
such that
, and on
,
![]() |
To prove Theorem 2.5, we proceed as follows. Fix
so that
; i.e.,
permutes exactly
labels. Two simple applications of the triangle inequality yields that
![]() |
and
![]() |
Combining the above, we have that
![]() |
In the proof of Theorem 2.3, if we set
in Eq. A.5 when applying McDiarmid’s inequality, then under the assumptions of Theorem 2.5, there exists an event
with
such that on
,
![]() |
Next, note that
![]() |
Hoeffding’s inequality (see, for example, [11]) yields that
![]() |
(A.7) |
![]() |
(A.8) |
with probability at least
![]() |
(A.9) |
(where the last inequality followed from the assumptions in the Theorem, as under the assumptions
). Therefore there exists an event
such that Eq. (A.7) and (A.8) hold on
, and 
Writing
![]() |
we see that
![]() |
We see then that on
,
![]() |
(A.10) |
Let
be such that
![]() |
If
for
,
on
, where
![]() |
Define the event
![]() |
and note that
![]() |
Combined, this yields
![]() |
as desired.□
A.3 Proof of Theorem 2.7
For a given permutation
on
, we define the permutation
uniquely as follows:
![]() |
(A.11) |
For example, if
,
, and
![]() |
then
![]() |
For a permutation matrix
, we define
analogously, where we recall here that
is the set of permutation matrices
in
satisfying
(i.e., fixing all core labels). Define
. Define the events
![]() |
is the event that the optimal GMP permutation is not in
and the graphs are not core
-matchable for
. As
, we have that
.
Suppose that
(with corresponding permutation
) permutes
core labels, where
![]() |
(A.12) |
![]() |
(A.13) |
Applying the results in Appendix A on the Bivariate Bernoulli distribution, we see that
is a function of
independent Bernoulli random variables, where
![]() |
As in the proof of Theorem 2.3, we next apply Proposition A.1 to bound the probability that
provides a better matching than
. By the assumption that
if either
or
is a junk vertex, it holds that
![]() |
To ease notation, we define
, so that
![]() |
To use a union bound, note that the number of permutations
with error counts in Eqs. (A.12–A.13) given by
and
is bounded above by
. Let
be the number of core vertices permuted by
. Hence, if
and
![]() |
then we have that
![]() |
as desired.□
Footnotes
These graphs were provided as part of the DARPA XDATA project.
Contributor Information
Vince Lyzinski, Email: vlyzinsk@umd.edu.
Daniel L Sussman, Email: sussman@bu.edu.
References
- 1. Abbe E. & Sandon C. (2015) Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE Computer Society, pp. 670–688. [Google Scholar]
- 2. Asta D. & Shalizi C. (2015) Geometric network comparisons. Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI’15. Arlington, Virginia, USA: AUAI Press, pp. 102–110. [Google Scholar]
- 3. Babai L. (2016) Graph isomorphism in quasipolynomial time. Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. New York, NY, USA: ACM, pp. 684–697. [Google Scholar]
- 4. Barak B., Chou C., Lei Z., Schramm T. & Sheng Y. (2019) (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. Advances in Neural Information Processing Systems. 9186–9194
- 5. Bayati M., Gerritsen M., Gleich D. F., Saberi A. & Wang Y. (2009) Algorithms for large, sparse network alignment problems. 2009 Ninth IEEE International Conference on Data Mining. Miami, FL: IEEE, pp. 705–710. [Google Scholar]
- 6. Bollobás B. (2001) Random Graphs. Cambridge University Press. [Google Scholar]
- 7. Bougleux S., Brun L., Carletti V., Foggia P., Gaüzère B. & Vento M. (2017) Graph edit distance as a quadratic assignment problem. Pattern Recogn. Lett., 87, 38–46. [Google Scholar]
- 8. Cela E. (2011) The Quadratic Assignment Problem: Theory and Algorithms; Combinatorial Optimization, vol. 1. New York, NY: Springer. [Google Scholar]
- 9. Chatterjee S. (2015) Matrix estimation by universal singular value thresholding. Ann. Stat., 43, 177–214. [Google Scholar]
- 10. Chen L., Vogelstein J. T., Lyzinski V. & Priebe C. E. (2016) A joint graph inference case study: the c. elegans chemical and electrical connectomes. Worm, vol. 5. [DOI] [PMC free article] [PubMed]
- 11. Chung F. & Lu L (2006) Concentration inequalities and martingale inequalities: a survey. Int. Math., 3, 79–127. [Google Scholar]
- 12. Conte D., Foggia P., Sansone C. & Vento M. (2004) Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artificial Intell., 18, 265–298. [Google Scholar]
- 13. Cullina D. & Kiyavash N. (2016) Improved achievability and converse bounds for Erd̋s–Rényi graph matching. ACM SIGMETRICS Performance Evaluation Review, vol. 44 New York, NY, USA: ACM, pp. 63–72. [Google Scholar]
- 14. Cullina D. & Kiyavash N. (2017) Exact alignment recovery for correlated Erd̋s–Rényi graphs. arXiv preprint arXiv:1711.06783.
-
15.
Cullina D., Kiyavash N., Mittal P. & Poor H. V. (2018) Partial recovery of Erd̋s–Rényi graph alignment via
-core alignment. arXiv preprint arXiv:1809.03553.
- 16. Desikan R. S., Ségonne F., Fischl B., Quinn B. T., Dickerson B. C., Blacker D., Buckner R. L., Dale A. M., Maguire R. P. & Hyman B. T (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage, 31, 968–980. [DOI] [PubMed] [Google Scholar]
- 17. Ding J., Ma Z., Wu Y. & Xu J. (2018) Efficient random graph matching via degree profiles. arXiv preprint arXiv:1811.07821.
- 18. Elmsallati A., Clark C. & Kalita J (2016) Global alignment of protein-protein interaction networks: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform., 13, 689–705. [DOI] [PubMed] [Google Scholar]
- 19. Emmert-Streib F., Dehmer M. & Shi Y. (2016) Fifty years of graph matching, network alignment and network comparison. Inform. Sci., 346–347, 180–197. [Google Scholar]
- 20. Escolano F., Hancock E. R. & Lozano M. (2011) Graph matching through entropic manifold alignment. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 2417–2424. [Google Scholar]
- 21. Fang F., Sussman D. L. & Lyzinski V. (2018) Tractable graph matching via soft seeding. arXiv preprint arXiv:1807.09299.
- 22. Feldman V., Grigorescu E., Reyzin L., Vempala S. & Xiao Y. (2013) Statistical algorithms and a lower bound for detecting planted cliques. Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC’13. New York, NY, USA: ACM, pp. 655–664. [Google Scholar]
- 23. Foggia P., Percannella G. & Vento M. (2014) Graph matching and learning in pattern recognition in the last 10 years. Int. J. Pattern Recogn. Artificial Intell., 28, 1450001. [Google Scholar]
- 24. Frank M. & Wolfe P. (1956) An algorithm for quadratic programming. Nav. Res. Logist. Q., 3, 95–110. [Google Scholar]
- 25. Heimann M., Shen H., Safavi T. & Regal D. K. (2018) Representation learning-based graph alignment. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, pp. 117–126. [Google Scholar]
- 26. Hoff P. D., Raftery A. E. & Handcock M. S. (2002) Latent space approaches to social network analysis. J. Amer. Statist. Assoc., 97,1090–1098. [Google Scholar]
- 27. Holland P. W., Laskey K. B. & Leinhardt S. (1983) Stochastic blockmodels: First steps. Soc. Netw., 5,109–137. [Google Scholar]
- 28. Horn R. A. & Johnson C. R. (2012) Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press. [Google Scholar]
- 29. Kazemi E., Yartseva L. & Grossglauser M. (2015) When can two unlabeled networks be aligned under partial overlap? 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). Washington, DC, USA: IEEE Computer Society, pp. 33–42. [Google Scholar]
- 30. Klau G. W. (2009) A new graph-based method for pairwise global network alignment. BMC Bioinform., 10, S59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Landman B. A., Huang A. J., Gifford A., Vikram D. S., Lim I. A. L., Farrell J. A. D., Bogovic J. A., Hua J., Chen M. & Jarso S. (2011) Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage, 54, 2854–2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Le C. M., Levina E. & Vershynin R. (2017) Concentration and regularization of random graphs. Random Struct. Algor., 51,538–561. [Google Scholar]
- 33. Lee J., Cho M. & Lee K. M. (2010) A graph matching algorithm using data-driven markov chain monte carlo sampling. 2010 20th International Conference on Pattern Recognition (ICPR). Washington, DC, USA: IEEE Computer Society, pp. 2816–2819. [Google Scholar]
- 34. Li L. & Campbell W. M. (2015) Matching community structure across online social networks. NIPS Workshop on Networks in the Social and Information Sciences. Montreal, Quebec, Canada. [Google Scholar]
- 35. Lin L., Liu X. & Zhu S.-C. (2010) Layered graph matching with composite cluster sampling. IEEE Trans. Pattern Anal. Mach. Intell., 32, 1426–1442. [DOI] [PubMed] [Google Scholar]
- 36. Loiola E. M., de Abreu N. M. M., Boaventura-Netto P. O., Hahn P. & Querido T. (2007) A survey for the quadratic assignment problem. Eur. J. Oper. Res., 176,657–690. [Google Scholar]
- 37. Lyzinski V. (2018) Information recovery in shuffled graphs via graph matching. IEEE Trans. Inform. Theory, 64, 3254–3273. [Google Scholar]
- 38. Lyzinski V., Fishkind D. E., Fiori M., Vogelstein J. T., Priebe C. E. & Sapiro G. (2016) Graph matching: relax at your own risk. IEEE Trans. Pattern Anal. Mach. Intell., 38,60–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Lyzinski V., Fishkind D. E. & Priebe C. E. (2014) Seeded graph matching for correlated Erdos–Renyi graphs. J. Mach. Learn. Res., 15, 3513–3540. [Google Scholar]
- 40. Lyzinski V., Levin K. & Priebe C. E. (2019) On consistent vertex nomination schemes. J. Mach. Learn. Res, 20, 1–39. [Google Scholar]
- 41. McDiarmid C. (1989) On the method of bounded differences. Surveys Combin., 141, 148–188. [Google Scholar]
- 42. Mossel E., Neeman J. & Sly A. (2014) Belief propagation, robust reconstruction and optimal recovery of block models. Proceedings of The 27th Conference on Learning Theory, pp. 356–370.
- 43. Onaran E., Garg S. & Erkip E. (2016) Optimal de-anonymization in random graphs with community structure. 2016 50th Asilomar Conference on Signals, Systems and Computers. Washington, DC, USA: IEEE Computer Society, pp. 709–713. [Google Scholar]
- 44. Pedarsani P. & Grossglauser M. (2011) On the privacy of anonymized networks. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1235–1243. [Google Scholar]
- 45. Robles-Kelly A. & Hancock E. R. (2007) A riemannian approach to graph embedding. Pattern Recogn., 40,1042–1056. [Google Scholar]
- 46. Gray W. R., Bogovic J. A., Vogelstein J. T., Landman B. A., Prince J. L. & Vogelstein R. J. (2012) Magnetic resonance connectome automated pipeline: an overview. IEEE Pulse, 3, 42–48. [DOI] [PubMed] [Google Scholar]
- 47. Roncal W. R., Koterba Z. H., Mhembere D., Kleissas D. M., Vogelstein J. T., Burns R., Bowles A. R., Donavos D. K., Ryman S. & Jung R. E. (2013) MIGRAINE: MRI graph reliability analysis and inference for connectomics. IEEE Global Conf. Signal Inform. Process., 313–316. [Google Scholar]
- 48. Sang J. & Xu C. (2012) Robust face-name graph matching for movie character identification. IEEE Trans. Multimedia, 14, 586–596. [Google Scholar]
- 49. Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2019) Matched filters for noisy induced subgraph detection. IEEE Trans. Pattern Anal. Mach. Intell., 1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Tang M., Athreya A., Sussman D. L., Lyzinski V., Park Y. & Priebe C. E. (2017) A semiparametric two-sample hypothesis testing problem for random dot product graphs. J. Comput. Graph. Statist., 26, 344–354. [Google Scholar]
- 51. Trosset M. W., Priebe C. E., Park Y. & Miller M. I. (2008) Semisupervised learning from dissimilarity data. Comput. Statist. Data Anal., 52, 4643–4657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Udell M. & Townsend A. (2017) Nice latent variable models have log-rank. arXiv preprint arXiv:1705.07474.
- 53. Umeyama S. (1988) An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell., 10, 695–703. [Google Scholar]
- 54. Vogelstein J. T., Conroy J. M., Lyzinski V., Podrazik L. J., Kratzer S. G., Harley E. T., Fishkind D. E., Vogelstein R. J. & Priebe C. E. (2014) Fast approximate quadratic programming for graph matching. PLoS ONE, 10(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Xu J. (2017) Rates of convergence of spectral methods for graphon estimation. arXiv preprint arXiv:1709.03183.
- 56. Yartseva L. & Grossglauser M. (2013) On the performance of percolation graph matching. Proceedings of the First ACM Conference on Online Social Networks. New York, NY, USA: ACM, pp. 119–130. [Google Scholar]
- 57. Young S. & Scheinerman E. (2007) Random dot product graph models for social networks. Proceedings of the 5th International Conference on Algorithms and Models for the Web-Graph. Berlin, Heidelberg: Springer, pp. 138–149. [Google Scholar]
- 58. Zaslavskiy M., Bach F. & Vert J. P. (2009) A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell., 31, 2227–2242. [DOI] [PubMed] [Google Scholar]
- 59. Zhang S. & Final H. T. (2016) Fast attributed network alignment. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, pp. 1345–1354. [Google Scholar]
- 60. Zhang Y. (2018) Consistent polynomial-time unseeded graph matching for Lipschitz graphons. arXiv preprint arXiv:1807.11027.
- 61. Zhang Y. (2018) Unseeded low-rank graph matching by transform-based unsupervised point registration. arXiv preprint arXiv:1807.04680.
- 62. Zhou F. & De la Torre F. (2012) Factorized graph matching. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE Computer Society, pp. 127–134. [Google Scholar]
- 63. Zhu M. & Ghodsi A. (2006) Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Statist. Data Anal., 51, 918–930. [Google Scholar]




























































































































































































































































































































