Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2016 Oct 21;49(4):694–741. doi: 10.1002/rsa.20692

Harnessing the Bethe free energy

Victor Bapst 1, Amin Coja‐Oghlan 1,
PMCID: PMC5153882  PMID: 28035178

ABSTRACT

A wide class of problems in combinatorics, computer science and physics can be described along the following lines. There are a large number of variables ranging over a finite domain that interact through constraints that each bind a few variables and either encourage or discourage certain value combinations. Examples include the k‐SAT problem or the Ising model. Such models naturally induce a Gibbs measure on the set of assignments, which is characterised by its partition function. The present paper deals with the partition function of problems where the interactions between variables and constraints are induced by a sparse random (hyper)graph. According to physics predictions, a generic recipe called the “replica symmetric cavity method” yields the correct value of the partition function if the underlying model enjoys certain properties [Krzkala et al., PNAS (2007) 10318–10323]. Guided by this conjecture, we prove general sufficient conditions for the success of the cavity method. The proofs are based on a “regularity lemma” for probability measures on sets of the form Ωn for a finite Ω and a large n that may be of independent interest. © 2016 Wiley Periodicals, Inc. Random Struct. Alg., 49, 694–741, 2016

Keywords: random graphs, Belief Propagation, cavity method, regularity lemma

1. INTRODUCTION

Despite their simplicity, or perhaps because thereof, the first and the second moment method are the most widely used techniques in probabilistic combinatorics. Erdős employed the first moment method famously to lower‐bound the Ramsey number as well as to establish the existence of graphs of high girth and high chromatic number 28, 29. Even a half‐century on, deterministic constructions cannot hold a candle to these probabilistic results 14, 41. Moreover, the second moment method has been used to count prime factors 51 and Hamilton cycles 47 as well as to determine the two possible values of the chromatic number of a sparse random graph 3.

Yet there are quite a few problems for which the standard first and the second moment methods are too simplistic. The random k‐SAT model is a case in point. There are n Boolean variables x1,,xn and m clauses a1,,am, where m=αn (the real αn rounded up to the next integer) for some fixed α>0. Each clause binds k variables, which are chosen independently and uniformly, and discourages them from taking precisely one of the 2k possible truth value combinations. The forbidden combination is chosen uniformly and independently for each clause.

The random k‐SAT instance Φ=Φk(n,m) gives rise to a probability measure on the set {0,1}n of all Boolean assignments naturally. Indeed, for a given parameter β0 the Gibbs measure μΦ,β is defined by letting

μΦ,β(σ)=1Zβ(Φ)i=1mexp(β1{σviolatesai}) (1.1)

for every assignment σ{0,1}n, where

Zβ(Φ)=σ{0,1}ni=1mexp(β1{σviolatesai}) (1.2)

is called the partition function. Thus, the Gibbs measure weighs assignments according to the number of clauses that they violate. In effect, by tuning β we can interpolate between just the uniform distribution on {0,1}n (β = 0) and a measure that strongly favours satisfying assignments (β). Hence, if we think of Φ as inducing a “height function” σ#{clausesof Φ violated by σ} on the set of assignments, then varying β allows us to explore the resulting landscape. Apart from its intrinsic combinatorial interest, the shape of the height function, the so‐called “Hamiltonian”, governs the performance of algorithms such as the Metropolis process or Simulated Annealing.

To understand the Gibbs measure it is key to get a handle on the partition function Zβ(Φ). Of course, the default approach to this kind of problem would be to apply the first and second moment methods. However, upon closer inspection it emerges that Zβ(Φ)<exp(Ω(n))E[Zβ(Φ)] with high probability for any α,β>0 5. In other words, the first moment over‐estimates the partition function of a typical random formula by an exponential factor. The reason for this is a “lottery effect”: a tiny minority of formulas render an exceptionally high contribution to E[Zβ(Φ)]. Unsurprisingly, going to the second moment only exacerbates the problem and thus for any α,β>0 we find E[Zβ(Φ)2]exp(Ω(n))E[Zβ(Φ)]2. In other words, the second moment method fails rather spectacularly for all possible parameter combinations.

The first and the second moment method fall victim to similar large deviations effects in many alike “random constraint satisfaction problems”. These problems, ubiquitous in combinatorics, information theory, computer science and physics 4, 36, 46, can be described along the following lines. A random factor graph, chosen either from a uniform distribution (like the random k‐SAT model above) or from a suitable configuration model, induces interactions between the variables and the constraints. The variables range over a fixed finite domain Ω and each constraint binds a few variables. The constraints come with “weight functions” that either encourage or discourage certain value combinations of the incident variables. Multiplying up the weight functions of all the contraints just like in ((1.1), (1.2)), we obtain the Gibbs measure and the partition function.

With the standard first and second moment method drawing a blank, we seem to be at a loss as far as calculating the partition function is concerned. However, physicists have put forward an ingenious albeit non‐rigorous alternative called the cavity method 36. This technique, which applies almost mechanically to any problem that can be described in the language of sparse random factor graphs, yields an explicit conjecture as to the value of the partition function. More specifically, the cavity method comes in several installments. In this paper, we are concerned with the simplest, so‐called “replica symmetric” version.

In one of their key papers 34 physicists hypothesized abstract conditions under which the replica symmetric cavity method yields the correct value of the partition function. The thrust of this paper is to prove corresponding rigorous results. Specifically, according to 34 the replica symmetric cavity method gives the correct answer if the Gibbs measure satisfies certain correlation decay properties. For example, the Gibbs uniqueness condition requires that under the Gibbs measure the value assigned to a variable x is asymptotically independent of the values assigned to the variables at a large distance from x in the factor graph. In Corollary 4.6 below we prove that this condition is indeed sufficient to guarantee the success of the cavity method. Additionally, Theorems 4.4 and 4.5 yield rigorous sufficient conditions in terms of substantially weaker conditions, namely a symmetry property and the non‐reconstruction property.

A key feature of the paper is that we establish these results not for specific examples but generically for a very wide class of factor graph models. Of course, stating and proving general results requires a degree of abstraction. In particular, we resort to the framework of local weak convergence of graph sequences 8, 35. This framework suits the physics predictions well, which come in terms of the “limiting tree” that describes the local structure of a large random factor graph. To be precise, the replica symmetric prediction is given by a functional called the Bethe free energy applied to an (infinite) random tree.

The principal tool to prove these results is a theorem about the structure of probability measures on sets of the form Ωn for some fixed finite set Ω and a large integer n, Theorem 2.1 below. We expect that this result, which is inspired by Szemerédi's regularity lemma 49, will be of independent interest. To prove our results about random factor graphs, we combine Theorem 2.1 with the theory of local weak convergence to carry out completely generically “smart” first and second moment arguments that avoid the lottery effects that the standard arguments fall victim to.

In Section 2 we begin with the abstract results about probability measures on cubes. Subsequently, in Section 3 we set the stage by introducing the formalism of factor graphs and local weak convergence. Further, in Section 4 we state and prove the main results about Gibbs measures on random factor graphs. Finally, Section 5 contains the proof of a technical result that enables us to control the local structure of random factor graphs.

1.1. Related Work

A detailed (non‐rigorous) discussion of the cavity method can be found in 36. It is known that the replica symmetric version of the cavity method does not always yield the correct value of the partition function. For instance, in some factor graph models there occurs a “condensation phase transition” beyond which the replica symmetric prediction is off 20, 34. The more complex “1‐step replica symmetry breaking (1RSB)” version of the cavity method 37 is expected to yield the correct value of the partition function some way beyond condensation. However, another phase transition called “full replica symmetry breaking” spells doom on even the 1RSB cavity method 36.

The replica symmetric cavity method has been vindicated rigorously in various special cases. For instance, Montanari and Shah 38 proved that in the random k‐SAT model the replica symmetric prediction is correct up to the Gibbs uniqueness threshold. A similar result was obtained by Bandyopadhyay and Gamarnik 9 for graph colorings and independent sets. Furthermore, Dembo, Montanari and Sun 23 proved the replica symmetric conjecture on a class of models with specific types of constraints. A strength of 23 is that the result applies even to sequences of non‐random factor graphs under a local weak convergence assumption. But both 23, 38 are based on the “interpolation method” 30, 32, 42, which entails substantial restrictions on the types of models that can be handled. By contrast, the present proof method is based on a completely different approach centered around the abstract classification of measures on cubes that we present in Section 2.

Since the “vanilla” second moment method fails on the random k‐SAT model, more sophisticated variants have been proposed. The basic idea is to apply the second moment method not to the partition function itself but to a tweaked random variable. For instance, Achlioptas and Moore 2 applied the second moment method to NAE‐satisfying assignments, i.e., both the assignment and its binary inverse satisfy all clauses. However, the number of NAE‐satisfying assignments is exponentially smaller than the total number of satisfying assignments and thus this type of argument cannot yield the typical value of the partition function. The same is true of the more subtle random variable of Achlioptas and Peres 6. Furthermore, the work of Ding, Sly and Sun 26 that yields the precise k‐SAT threshold for large k is based on applying the second moment method to a random variable whose construction is guided by the 1RSB cavity method. Among other things, the random variable from 26 incorporates conditioning on the local structure of the factor graph, an idea that will be fundamental to our arguments as well.

The material of Section 2 of the present paper has recently been investigated from the more analytic viewpoint of the theory of graph limits 18. This leads to a general notion of limits of probability measures on discrete cubes. The article 18 also discusses the connection with the Aldous‐Hoover representation of exchangeable arrays, which has long been known to be related to the theory of graph limits, and Panchenko's notion of asymptotic Gibbs measures 7, 33, 44, 45. A further recent application of the methods of the present paper to two special classes of random factor graph models can be found in 19.

1.2. Notation

If X is a finite set, then we denote by P(X) the set of probability measures on X. Moreover, ·TV signifies the total variation norm. If μ is a probability measure on a product space XV for finite sets X, V and UV, then μUP(XU) denotes the marginal distribution of μ on U. That is, if (xu)uUXU, then

μU((xu)uU)=(xu)uVUXVUμ((xv)vV).

If U={u} for some uV, then we briefly write μu rather than μ{u}. Further, if SXV is an event with μ(S)>0, than μ[·|S] is the conditional distribution given S. That is, for any event QXV we have μ[Q|S]=μ[QS]/μ[S].

The entropy of a probability measure μP(X) is denoted by H(μ). Thus, with the convention that 0ln0=0 we have H(μ)=xXμ(x)lnμ(x). Further, agreeing that 0ln00=0 as well, we recall that the Kullback‐Leibler divergence of μ,νP(X) is

D(ν||μ)=xXν(x)lnν(x)μ(x)[0,].

We are going to work with probability measures on sets Ωn for a (small) finite Ω and a large integer n a lot. If μP(Ωn), then we write σμ,τμ for two independent samples from μ. Where μ is obvious from the context we just write σ,τ. Additionally, if X(σ) is a random variable, then X(σ)μ=σΩnμ(σ)X(σ) stands for the expectation of X with respect to μ. Further, if σΩn,U[n] and ωΩ, then we let

σ[ω|U]=|σ1(ω)U|/|U|.

Thus, σ[·|U] is a probability distribution on Ω, namely the distribution of σ(x) for a random xU. If U={x} for some x[n], then we just write σ[ω|x] rather than σ[ω|{x}]. Clearly, σ[ω|x]=1{σ(x)=ω}.

We use the ·μ notation for averages over μP(Ωn) to avoid confusion with averages over other, additional random quantities, for which we reserve the common symbols E[·],P[·]. Furthermore, we frequently work with conditional expectations. Hence, let us recall that for a probability space (X,A,P), a random variable X:X and a σ‐algebra FA the conditional expectation E[X|F] is a F‐measurable random variable on X such that for every F‐measurable event F we have E[1{F}E[X|F]]=E[1{F}X]. Moreover, recall that the conditional variance is defined as Var[X|F]=E[X2|F]E[X|F]2.

In line with the two previous paragraphs, if Y:Ωn is a random variable, μP(Ωn) and F is a σ‐algebra on Ωn, then we write Y|Fμ for the conditional expectation, which is a F‐measurable random variable σΩnY|Fμ(σ). Accordingly, for an event AΩn with μ(A)>0 we write Y|Aμ=Y1{A}μ/μ(A) for the expectation of Y given A.

Finally, we need the Paley‐Zygmund inequality 43, Lemma 19]: if X0 is a random variable with a finite second moment, then

P[XtE[X]](1t)2E[X]2E[X2]forany0<t<1. (1.3)

2. PROBABILITY MEASURES ON THE CUBE

In this section we present a general “regularity lemma” for probability measures on sets Ωn for some finite set Ω and a large integer n (Theorem 2.1 below).

2.1. Examples

Needless to say, probability distributions on sets Ωn for a small finite Ω and a large integer n are ubiquitous. To get an idea of what we might hope to prove about them in general, let us look at a few examples.

The simplest case certainly is a product measure μ=pn with pP(Ω). By the Chernoff bound, for any fixed ε>0 there is n0=n0(ε,Ω)>0 such that for n>n0 we have

σ[·|U]pTVμ<εforeveryU[n]suchthat|U|εn. (2.1)

In words, if we fix a large enough set U of coordinates and then choose σ randomly, then with probability close to one the empirical distribution on U will be close to p.

As a twist on the previous example, let pP(Ω), assume that n is a square and define a measure μ by letting

μ(ω1,,ωn)=i=0n1[p(ω1+in)1{j[n]:ωj+in=ω1+in}].

In words, the coordinates come in blocks of size n. While the values of all the coordinates in one block coincide and have distribution p, the coordinates in different blocks are independent. Although μ is not a product distribution, (2.1) is satisfied for any fixed ε>0 and large enough n. Furthermore, if for a fixed k > 1 we choose x1,,xk[n] uniformly and independently, then

Eμ{x1,,xk}μx1μxkTV<ε, (2.2)

provided that n>n1(ε,k,Ω) is sufficiently large. This is because for large enough n it is unlikely that two of the randomly chosen x1,,xk belong to the same block.

As a third example, consider the set Ω={0,1} and the measure μ defined by

μ(0)(ω1,,ωn)=(13)i=1nωi(23)ni=1nωi,
μ(1)(ω1,,ωn)=(23)i=1nωi(12)ni=1nωi,μ=12(μ(0)+μ(1)).

All the marginals μi,i[n], are equal to the uniform distribution on {0, 1}. But of course the uniform distribution on Ωn is a horrible approximation to μ. Indeed, by the Chernoff bound with overwhelming probability a point (ω1,,ωn) drawn from μ either satisfies 1ni=1nωi1/3 or 1ni=1nωi2/3. However, the conditional distribution given, say, 1ni=1nωi1/2, is close to a product measure. Thus, μ induces a decomposition of Ωn into two “states” S0={1ni=1nωi1/2},S1={1ni=1nωi>1/2} such that μ[·|S0],μ[·|S1] are close to product measures.

As a final example, consider Ω={0,1}, assume that n is even and define μP(Ωn) by letting

μ(ω1,,ωn)=(12)n/2(13)i>n/2ωi(23)n/2i>n/2ωi.

In words, μ is a product measure with marginal distribution Be(1/2) on the first n/2 coordinates and Be(1/3) on the other coordinates. Clearly, μ satisfies (2.1) with p=Be(1/2) for sets U[n/2] and with p=Be(1/3) for sets U[n][n/2], provided that n is large.

In summary, the following picture emerges. The conditions (2.1) and (2.2) are proxies for saying that a given measure μ resembles a product measure. Furthermore, in order to obtain from a given μ measures that satisfy (2.1) or (2.2) it may be necessary to decompose the space Ωn into “states” so that the conditional distributions have these properties. In addition, because different coordinates may have different marginal distributions, for (2.1) to hold it may be necessary to partition the set [n] of coordinates.

2.2. Homogeneity

The main result of this section shows that by partitioning the space Ωn and/or the set [n] of coordinates it is always possible to “approximate” a given measure μ by measures that satisfy (2.1) for some suitable p as well as (2.2). In fact, the number of parts that we have to partition [n] and Ωn into is bounded only in terms of the desired accuracy but independently of n.

Let us introduce some terminology. If V=(V1,,Vk) is a partition of some set V, then we call #V=k the size of V. Furthermore, a partition W=(W1,,Wl) refines another partition V=(V1,,Vk) if for each i[l] there is j[k] such that WiVj.

For ε>0 we say that μP(Ωn) is ε ‐regular on a set U[n] if for every subset WU of size |W|ε|U| we have

σ[·|W]σ[·|U]TVμ<ε.

Further, μ is ε ‐regular with respect to a partition V if there is a set J[#V] such that i[#V]J|Vi|<εn and such that μ is ε‐regular on V i for all iJ. Additionally, if V is a partition of [n] and S is a partition of Ωn, then we say that μ is ε ‐homogeneous with respect to (V,S) if there is a subset I[#S] such that the following is true.

  • 1

    HM1: We have μ(Si)>0 for all iI and i[#S]Iμ(Si)<ε.

  • 2

    HM2: for all i[#S] and j[#V] we have maxσ,σSiσ[·|Vj]σ[·|Vj]TV<ε.

  • 3

    HM3: for all iI the measure μ[·|Si] is ε‐regular with respect to V.

  • 4

    HM4: μ is ε‐regular with respect to V.

Theorem 2.1

For any

ε>0

there exists

N=N(ε,Ω)>0

such that for every n > N, any measure

μP(Ωn)

and any partition

V0

of

[n]

of size

#V01/ε

the following is true. There exist a refinement

V

of

V0

and a partition

S

of

Ωn

such that

#V+#SN

and such that μ is

ε

‐homogeneous with respect to

(V,S)

Informally speaking, Theorem 2.1 shows that any probability measure μP(Ωn) admits a partition (V,S) such that the following is true. Almost the entire probability mass of μ belongs to parts S i such that the conditional measure μ[·|Si] is ε‐regular w.r.t. V. This means that almost every coordinate x[n] belongs to a class V j such that for every “large” UVj for σ chosen from μ[·|Si] very likely the empirical distribution σ[·|U] is close to the marginal distribution σ[·|Vj]μ[·Si] of the entire class.

Theorem 2.1 and its proof, which we defer to Section 2.3, are inspired by Szemerédi's regularity lemma 49. Let us proceed to state a few consequences of Theorem 2.1.

A (ε,k) ‐state of μ is a set SΩn such that μ(S)>0 and

1nkx1,,xk[n]μ{x1,,xk}[·|S]μx1[·|S]μxk[·|S]TV<ε.

In other words, if we choose x1,,xk[n] independently and uniformly at random, then the expected total variation distance between the joint distribution μ{x1,,xk}[·|S] of x1,,xk and the product μx1[·|S]μxk[·|S] of the marginal distributions is small.

Corollary 2.2

For any

ε>0,k2

there exists

η=η(ε,k,Ω)>0

such that for every

n>1/η

any measure

μP(Ωn)

has pairwise disjoint

(ε,k)

‐states

S1,,SN

such that

μ(Si)η

for all

i[N]

and

i=1Nμ(Si)1ε

Thus, we can chop the space Ωn into subsets S1,,SN,N1/η, that capture almost the entire probability mass such that μ[·|Si] “resembles a product measure” for each i[N]. We prove Corollary 2.2 in Section 2.4.

Let us call μ (ε,k) ‐symmetric if S=Ωn itself is an (ε,k)‐state.

Corollary 2.3

For any ε,k there exists δ such that for any ξ>0 there is η>0 such that for all n>1/η and all μP(Ωn) the following is true. If for any two (ξ,2) ‐states S1, S2 with μ(S1),μ(S2)η we have

1nx[n]μx[·|S1]μx[·|S2]TV<δ, (2.3)

then μ is (ε,k) ‐symmetric.

Thus, the entire measure μ “resembles a product measure” if extensive states have similar marginal distributions. Conversely, we have the following.

Corollary 2.4

For any

ε>0,η>0

there exists

δ>0

such that for all

n>1/δ

and all

μP(Ωn)

the following is true. If μ is

(δ,2)

‐symmetric, then for any S with

μ(S)η

we have

1nx[n]μx[·|S]μxTV<ε.

The proofs of Corollaries 2.3 and 2.4 can be found in Sections 2.5 and 2.6, respectively. Finally, in Section 2.7 we prove the following fact that will be useful in Section 4.

Proposition 2.5

For any ε>0 there exist δ>0 such that for large enough n the following is true. If μP(Ωn) is (δ,2) ‐symmetric, then μμP(Ωn×Ωn) is (ε,2) ‐symmetric.

2.3. Proof of Theorem 2.1

Throughout this section we assume that n is sufficiently large. To prove Theorem 2.1 and guided by 49, we define the index of μ with respect to a partition V of [n] as

indμ(V)=1|Ω|nωΩj[#V]xVj(σ[ω|x]σ[ω|Vj])2μ.

The index can be viewed as a conditional variance (cf. 50). Indeed, choose x[n] uniformly and independently of σ. Furthermore, let FV be the σ‐algebra generated by the events {xVi} for i[#V]. Writing E[·] and Var[·] for the expectation and variance with respect to the choice of x only, we see that

indμ(V)=1|Ω|ωΩEVar[σ[ω|x]|FV]μ.

Lemma 2.6

For any partition V of [n] we have indμ(V)[0,1]. If W is a refinement of V, then indμ(W)indμ(V)

The fact that indμ(V)[0,1] is immediate from the definition. Moreover, if W refines V, then FVFW. Consequently, EVar[σ[ω|x]|FW]μEVar[σ[ω|x]|FV]μ. Averaging over ωΩ yields indμ(W)indμ(V).

Lemma 2.7

If μP(Ωn) fails to be ε ‐regular with respect to V, then there is a refinement W of V such that #W2#V and indμ(W)indμ(V)ε4/(4|Ω|3).

Let J¯ be the set of all indices j[#V] such that there exists UVj of size |U|ε|Vj| such that

σ[·|U]σ[·|Vj]TVμε. (2.4)

Since μ fails to be ε‐regular with respect to V we have

jJ¯|Vj|εn. (2.5)

For each jJ¯ pick a set UjVj,|Uj|ε|Vj| such that (2.4) is satisfied. Then there exists ωjΩ such that

|σ[ωj|Uj]σ[ωj|Vj]|με/(2|Ω|). (2.6)

Let W be the partition obtained from V by splitting each class V j, jJ¯, into the sub‐classes Uj,VjUj. Clearly, #W2#V. Furthermore,

indμ(V)=1|Ω|ωΩEVar[σ[ω|x]|V]μ=1|Ω|ωΩ(EVar[σ[ω|x]|W]μ+EVar[E[σ[ω|x]|W]|V]μ)=indμ(W)+1|Ω|ωΩEVar[E[σ[ω|x]|W]|V]μ. (2.7)
 
 

If jJ¯ then (2.6) implies that on V j we have

Var[E[σ[ωj|x]|W]|V]μ|Uj||Vj|(σ[ωj|Uj]σ[ωj|Vj])2με34|Ω|2. (2.8)

Hence, combining (2.5) and (2.8), we find

1|Ω|ωΩEVar[E[σ[ω|x]|FW]|FV]με44|Ω|3. (2.9)

Finally, the assertion follows from (2.7) and (2.9).

Proof of Theorem 2.1

The set P(Ω) is compact. Therefore, there exists a partition Q=(Q1,,QK) of P(Ω) into pairwise disjoint sets such that for all i[K] and any two measures μ,μQi we have μμTV<ε.

Given any partition W of [n], we can construct a corresponding decomposition S(W) of Ωn as follows. Call σ,σΩn W‐equivalent if for every i[#W] there exists j[#Q] such that σ[·|Wi],σ[·|Wi]Qj. Then S(W) comprises of the equivalence classes.

We construct the desired partition V of [n] inductively, starting from any given partition V(0) of size at most 1/ε. The construction stops once μ is ε‐homogeneous with respect to (V(t),S(V(t))). Assuming that this is not the case, we obtain V(t+1) from V(t) as follows. If μ fails to be ε‐regular with respect to V(t), then we let V(t+1) be the partition promised by Lemma 2.7, which guarantees that

#V(t+1)2#V(t)andindμ(V(t+1))indμ(V(t))ε4/(4|Ω|3). (2.10)

Otherwise let S(t)=S(V(t)) and s(t)=#S(t) for the sake of brevity. Further, let μi,t=μ[·|Si(t)] for i[s(t)] with μ[Si(t)]>0. Moreover, let I¯(t) be the set of all i[s(t)] such that μ[Si(t)]>0 and μi,t fails to be ε‐regular with respect to V(t). If μ fails to be ε‐homogeneous with respect to (V(t),S(t)) but μ is ε‐regular w.r.t. V(t), then

iI¯(t)μ[Si(t)]ε. (2.11)

Lemma 2.7 shows that for any iI¯(t) there exists a refinement W(t,i) of V(t) such that

indμi,t(W(t,i))indμi,t(V(t))ε4/(4|Ω|3). (2.12)

Let V(t+1) be the coarsest common refinement of all the partitions (W(t,i))iI¯(t). Then

#V(t+1)#V(t)·2#Q#V(t). (2.13)

In addition, (2.12) and Lemma 2.6 imply

indμi,t(V(t+1))indμi,t(V(t))1{iI¯(t)}ε4/(4|Ω|3). (2.14)

Therefore, by (2.11), (2.14) and Bayes’ rule

indμ(V(t+1))=1n|Ω|ωΩj[#V(t+1)]xVj(t+1)(σ[ω|x]σ[ω|Vj(t+1)])2μ=1n|Ω|ω,j,xi[s(t)]:μ[Si(t)]>0μ[Si(t)](σ[ω|x]σ[ω|Vj(t+1)])2μi,t=i:μ[Si(t)]>0μ[Si(t)]indμi,t(V(t+1))ε5/(4|Ω|3)+i:μ[Si(t)]>0μ[Si(t)]indμi,t(V(t))=indμ(V(t))ε5/(4|Ω|3). (2.15)
 
 
 
 

Combining (2.10), (2.15) and Lemma 2.6, we conclude that μ is ε‐homogeneous with respect to (V(T),S(T)) for some T4|Ω|3/ε5. Finally, (2.13) entails that #V(T),#S(T) are bounded in terms of ε,Ω only.

2.4. Proof of Corollary 2.2

To derive Corollary 2.2 from Theorem 2.1 we use the following handy sufficient condition for (ε,k)‐symmetry.

Lemma 2.8

For any k2,ε>0 there is δ=δ(ε,k,Ω) such that for large enough n the following is true. Assume that μP(Ωn) is δ‐regular with respect to a partition V and set μ¯i(·)=σ[·|Vi]μ for i[#V]. If

i[#V]|Vi|nσ[·|Vi]μ¯iTVμ<δ,π2 (2.16)

then μ is (ε,k) ‐symmetric.

Choose a small ξ=ξ(ε,k,Ω)>0 and a smaller δ=δ(ξ)>0. Then (2.16) implies that there is J[#V] satisfying

jJ|Vj|(1ξ)n (2.17)

such that for all jJ,UVj,|U|ξ|Vj| we have

σ[·|U]μ¯jTVμξ. (2.18)

In particular, we claim that (2.18) implies the following (if ξ is small enough):

ωΩ,jJ,ΣΩn:μ(Σ)ξ1/4|{xVj:|σ[ω|x]|Σμμ¯j(ω)|>ξ1/4}|ξ1/4|Vj|. (2.19)
 

Indeed, assume that 1{σΣ}μξ1/4 and |{xVj:|σ[ω0|x]|Σμμ¯j(ω0)>ξ1/4}||>ξ1/4|Vj| for some ω0Ω. Then because σ[·|x]|Σμ is a probability measure on Ω for every x, there exists ωΩ such that the set U={xVj:σ[ω|x]|Σμ<μ¯j(ω)ξ1/4/|Ω|} has size |U|>ξ1/4|Vj|/(2|Ω|). In particular, σ[ω|U]|Σμμ¯j(ω)ξ1/4/|Ω|. Therefore, by Markov's inequality

1{σ[ω|U]μ¯j(ω)ξ1/3}|Σμμ¯j(ω)ξ1/4/|Ω|μ¯j(ω)ξ1/31ξ1/4/|Ω|1ξ1/31ξ1/4/(2|Ω|).

Consequently, we obtain

σ[·|U]μ¯jTVμξ1/3+1/41{σΣ}μ/(2|Ω|)ξ7/8.

Since |U|>ξ1/4|Vj|/(2|Ω|)>ξ|Vj|, this is a contradiction to (2.18).

Now, fix any ω1,,ωkΩ and let x1,,xk[n] be chosen independently and uniformly at random. Let Σh=Σh(x1,,xh)Ωn be the event that σ(xi)=ωi for all ih. We are going to show that for 0h<k,

E[μ(Σh)|σ[ωh+1|xh+1]|Σhμσ[ωh+1|xh+1]μ|]<ξ1/5. (2.20)

In the case h = 0 there is nothing to show. As for the inductive step, condition on x1,,xh.

  • 1
    Case 1: μ(Σh)ξ1/4: regardless of the choice of xh+1 we have
    μ(Σh)|σ[ωh+1|xh+1]|Σhμσ[ωh+1|xh+1μ|ξ1/4.
  • 2
    Case 2:
    μ(Σh)>ξ1/4
    : due to (2.17) with probability at least 12ξ we have xh+1Vj{x1,,xh} for some jJ. Hence, (2.19) implies Exh+1[|σ[ωh+1|xh+1]|Σhμσ[ωh+1|xh+1]μ|]ξ1/4.

Hence, (2.20) follows.

To complete the proof, we are going to show by induction on h[k] that

E|i=1hσ[ωi|xi]μi=1hσ[ωi|xi]μ|hξ1/5. (2.21)

For h = 1 there is nothing to show. To proceed from h to h + 1 we use the triangle inequality to write

E[|i=1h+1σ[ωi|xi]μi=1h+1σ[ωi|xi]μ|]
E[μ(Σh)|σ[ωh+1|xh+1]|Σhμσ[ωh+1|xh+1]μ|]
+E[σ[ωh+1|xh+1]μ|i=1hσ[ωi|xi]μi=1hσ[ωi|xi]μ|].

Invoking the induction hypothesis and (2.20) completes the proof.

Proof of Corollary 2.2

For a small enough δ=δ(ε,k)>0 let (V,S) be a pair of partitions of size at most N=N(δ,Ω) such that μ is δ/2‐homogeneous with respect to (V,S) as guaranteed by Theorem 2.1. Let η=ε/(2N) and let J be the set of all j[#S] such that μ(Sj)η and such that μ[·|Sj] is δ‐regular with respect to V. Then

j[#S]Jμ(Sj)δ+ε/2<ε.

Furthermore, for every jJ the measure μ[·|Sj] satisfies (2.16) due to HM2. Therefore, Lemma 2.8 implies that μ[·|Sj] is (ε,k)‐symmetric. Consequently, the sets (Sj)jJ are pairwise disjoint (ε,k)‐states with μ(Sj)η for all jJ and jJμ(Sj)1ε.

2.5. Proof of Corollary 2.3

Pick small enough δ=δ(ε,k,Ω),γ=γ(δ,ξ),η(γ)>0. Then by Theorem 2.1 μ is γ‐homogeneous with respect to (V,S) for partitions that satisfy #V+#SN=N(γ). Let J[#S] contain all j such that μ[·|Sj] is γ‐regular with respect to V and such that μ(Sj)η. Let μ¯i,j=σ[·|Vi]μ[·|Sj]. Then by HM2 for every jJ we have

1ni[#V]|Vi|σ[·|Vi]μ¯i,jTVμ[·|Sj]<3γ.

Therefore, Lemma 2.8 implies that S j is a (ξ,2)‐state. Consequently, our assumption (2.3) and the triangle inequality entail that for all j,jJ,

i[#V]|Vi|nσ[·|Vi]μ[·|Sj]σ[·|Vi]μ[·|Sj]TV<δ. (2.22)

Choosing η small, we can ensure that jJμ(Sj)δ. Therefore, letting μ¯i=σ[·|Vi]μ, we obtain from (2.22)

i[#V]|Vi|nσ[·|Vi]μ¯iTVμδ+i[#V]|Vi|njJμ(Sj)σ[·|Vi]μ¯iTVμ[·|Sj]2δ+i[#V]|Vi|njJμ(Sj)σ[·|Vi]μ[·|Sj]μ¯iTV[byHM2]5δ. (2.23)
 
 
 

Since μ is γ‐regular and thus 5δ‐regular w.r.t. V by HM4, (2.23) and Lemma 2.8 imply that μ is (ε,k)‐symmetric.

2.6. Proof of Corollary 2.4

Choose δ=δ(γ,η) small enough, assume that SΩn satisfies μ(S)η and that μ is (δ,2)‐symmetric. Assume for contradiction that

1nx[n]μx[·|S]μxTV>ε. (2.24)

Let

 
 

Then (2.24) entails that |W|εn/2. Therefore, there is ωΩ such that |Ws(ω)|εn/(4|Ω|) for either s=+1 or s = – 1. Let W=Ws(ω) for the sake of brevity. Of course, by the definition of W,

(σ[ω|W]μ[·|S]σ[ω|W]μ)2ε216|Ω|2. (2.25)

Since μ is (δ,2)‐symmetric,

(σ[ω|W]τ[ω|W]μ)2μ=1|W|2x,yW[σ[ω|x]σ[ω|y]μτ[ω|x]μτ[ω|y]μ]16δ|Ω|2ε2. (2.26)
 

On the other hand we have

(σ[ω|W]τ[ω|W]μ)2μμ(S)(τ[ω|W]μ[·|S]τ[ω|W]μ)2. (2.27)

Finally, plugging (2.25) and (2.26) into (2.27), we find 16δ|Ω|2ε2ηε232|Ω|2, which is a contradiction if δ is small enough.

2.7. Proof of Proposition 2.5

Choose small enough α=α(ε,Ω),γ=γ(α)>0,χ=χ(γ)>0 and an even smaller δ=δ(γ,χ)>0 and assume that μ is (δ,2)‐symmetric. Suppose that μ is χ‐homogeneous with respect to a partition (V,S) such that #V+#SN=N(γ) as promised by Theorem 2.1. Let J be the set of all j[#S] such that μ(Sj)γ2/N. Moreover, let I be the set of all i[#V] such that μ is χ‐regular on V i and |Vi|γn/N. By Corollary 2.4 we have

1|Vi|xViμx[·|Sj]μx[·]TV<γforalliI,jJ,

provided that δ is chosen small enough. Therefore, letting μ¯i=σ[·|Vi]μ, for all iI we have

σ[·|Vi]μ¯iTVμ<2γ. (2.28)

Fix some iI. We claim that μμ is α‐regular on V i. Hence, let UVi be a set of size |U|α|Vi| and let

={σ[·|U]μ¯iTVγ1/3}.

Then (2.28) implies that 1{σ}μ<γ1/3, because μ is γ‐regular on V i. Now, fix some σ. For ωΩ let U(σ,ω)={xU:σ(x)=ω}. Let

(σ,ω)={τ[·|U(σ,ω)]μ¯iTVγ1/3}.

If |U(σ,ω)|γ1/2|U|, then due to (2.28) and γ‐regularity we obtain, by a similar token as previously, 1{τ(σ,ω)}μγ1/3. Consequently, the event (σ) that (σ,ω) occurs for all ω satisfying |U(σ,ω)|γ1/2|U| has probability at least 1|Ω|γ1/3. Therefore, for any ω,ωΩ we obtain

|1|U|xU1{σ(x)=ω}1{τ(x)=ω}μi(ω)μi(ω)|μ
(|Ω|+1)γ1/3+|1|U|xU1{σ(x)=ω}1{τ(x)=ω}μi(ω)μi(ω)|
|σ,τ(σ)μ
γ1/7+maxω:|U(σ,ω)|γ1/2|U||τ[ω|U(σ,ω)]μi(ω)||σ,τ(σ)μγ1/8.

Summing over all ω,ω and choosing γ small enough, we conclude that μμ is α‐regular on V i.

Finally, (2.28) implies that μμ satisfies

(στ)[·|Vi]μ¯iμ¯iTVμμ<α.

Therefore, picking α small enough, we can apply Lemma 2.8 to conclude that μμ is (ε,2)‐symmetric.

3. FACTOR GRAPHS

3.1. Examples

The aim in this section is to set up a comprehensive framework for the study of “random factor graphs” and their corresponding Gibbs measures. To get started let us ponder a few concrete examples.

In the Ising model on a graph G=(V,E) the variables of the problem are just the vertices of the graph. The values available for each variable are ±1. Thus, an assignment is simply a map σ:V{±1}. Moreover, each edge of G gives rise to a constraint. Specifically, given a parameter β>0 we define a weight function ψ e corresponding to the edge e={v,w} by letting ψe(σ)=exp(βσ(v)σ(w)). Thus, edges e={v,w} give larger weight to assignments σ such that σ(v)=σ(w) than in the case σ(v)σ(w). The corresponding partition function reads

Zβ(G)=σ:V{±1}eEψe(σ)=σ:V{±1}exp[β{v,w}Eσ(v)σ(w)].

Further, the Gibbs distribution μG,β induced by G, β is the probability measure on {±1}V defined by

μG,β(σ)=1Zβ(G)eEψe(σ)=1Zβ(G)exp[β{v,w}Eσ(v)σ(w)].

Thus, μG,β weighs assignments according to the number of edges e={v,w} such that σ(v)=σ(w).

The Ising model has been studied extensively in the mathematical physics literature on various classes of graphs, including and particularly random graphs. For instance, if G(n,d) is a random regular graph of degree d on n vertices, then Zβ(G(n,d)) is known to “converge” to the value predicted by the cavity method 22. Formally, the cavity method yields a certain number F(β,d) such that

limn1nE[lnZβ(G(n,d))]=F(β,d). (3.1)

Because Zβ(G(n,d)) is exponential in n with high probability, the scaling applied in (3.1) is the appropriate one to obtain a finite limit. Furthermore, by Azuma's inequality lnZβ(G(n,d)) is concentrated about its expectation. Therefore, (3.1) implies that 1nlnZβ(G(n,d)) converges to F(β,d) in probability.

The Potts antiferromagnet on a graph G=(V,E) can be viewed as a twist on the Ising model. In this case we look at assignments σ:V[k] for some number k3. The weight functions associated with the edges are defined by ψe(σ)=exp(β1{σ(v)=σ(w)}) for some β>0. Thus, this time the edges prefer that the incident vertices receive different values. The Gibbs measure and the partition function read

μG,β(σ)=1Zβ(G)exp[β{v,w}E1{σ(v)=σ(w)}],
Zβ(G)=σ:V[k]exp[β{v,w}E1{σ(v)=σ(w)}].

While it is known that limn1nE[lnZβ(G(n,d))] exists and that lnZβ(G(n,d)) is concentrated about its expectation 15, the precise value remains elusive for a wide range of d,β (in contrast ferromagnetic version of the model 24). However, it is not difficult to see that for sufficiently large values of d,β we have 12

limn1nE[lnZβ(G(n,d))]<limn1nlnE[Zβ(G(n,d))].

Hence, just like in the random k‐SAT model the first moment overshoots the actual value of the partition function by an exponential factor. The Potts model is closely related to the k‐colorability problem. Indeed, if we think of the k possible values as colors, then for large β the Gibbs measure concentrates on colorings with few monochromatic edges.

As a third example let us consider the following version of the random k‐SAT model. Let k3,Δ>1 be fixed integers, let Vn={x1,,xn} be a set of Boolean variables and let dn:Vn×{±1}[Δ] be a map such that

m=xVn(dn(x,1)+dn(x,1))/k

is an integer. Then we let Φ(n,k,dn) be a random k‐CNF formula with m clauses in which each variable xVn appears precisely dn(x,1) times as a positive literal and precisely dn(x,1) times as a negative literal. As in Section 1, for a clause a and a truth assignment σ:V{0,1} we let ψa(σ)=exp(β1{σviolatesa}). Then for a given parameter β>0 we obtain a Gibbs measure that weighs assignments by the number of clauses that they violate and a corresponding partition function Zβ(Φ(n,k,dn)), cf. ((1.1), (1.2)). Hence, for given β>0,k3 and degree assignments (dn)n the problem of determining limn1nE[lnZβ(Φ(n,k,dn))] arises. This question is anything but straightforward even in the special case that dn(x,±1)=d0 is the same for all x. In 11 we show how the results of the present paper can be put to work to tackle this case.

3.2. Random Factor Graphs

The following definition encompasses a variety of concrete models.

Definition 3.1

Let Δ>0 be an integer, let Ω,Θ be finite sets and let Ψ={ψ1,,ψl} be a finite set of functions ψi:Ωhi(0,) of arity hi[Δ]. A (Δ,Ω,Ψ,Θ) ‐model M=(V,F,d,t,(ψa)aF) consists of

  • 1

    M1: a countable set V of variable nodes,

  • 2

    M2: a countable set F of constraint nodes,

  • 3
    M3: a map d:VF[Δ] such that
    xVd(x)=aFd(a), (3.2)
  • 4
    M4: a map t:CVCFΘ, where we let
    CV=xV{x}×[d(x)],CF=aF{a}×[d(a)],
    such that
    |t1(θ)CV|=|t1(θ)CF|foreachθΘ, (3.3)
  • 5

    M5: a map FΨ,aψa such that ψa:Ωd(a)(0,) for all aF

The size of the model is #M=|V|. Furthermore, a M ‐factor graph is a bijection G:CVCF,(x,i)G(x,i) such that t(G(x,i))=t(x,i) for all (x,i)CV

Of course, (3.2) and (3.3) require that either both quantities are infinite or both are finite.

The semantics is that Δ is the maximum degree of a factor graph. Moreover, Ω is the set of possible values that the variables of the model range over, e.g., the set {±1} in the Ising model. Further, Θ is a set of “types”. For instance, in the random k‐SAT model the types can be used to specify the signs of the literals. Additionally, Ψ is a set of possible weight functions.

A model M comes with a set V of variable nodes and a set F of contraint nodes. The degrees of these nodes are prescribed by the map d. Just like in the “configuration model” of graphs with a given degree sequence we create d(v) “clones” of each node v. The sets CV, C F contain the clones of the variable and constraint nodes, respectively. Further, the map t assigns a type to each “clone” of either a constraint or variable node and each constraint node a comes with a weight function ψ a.

A M‐factor graph is a type‐preserving matching G of the variable and constraint clones. Let G(M) be the set of all M‐factor graphs and write G=G(M) for a uniformly random sample from G(M). Contracting the clones of each node, we obtain a bipartite (multi‐)graph with variable nodes V and constraint nodes F. We often identify G with this multi‐graph. For instance, if we speak of the distance of two vertices in G we mean the length of a shortest path in this multi‐graph.

For a clone (x,i)CV we denote by (G,x,i)=G(x,i) the clone that G matches (x, i) to. Similarly, for (a,j)CF we write (G,a,j) for the variable clone (x, i) such that (G,x,i)=(a,j). Moreover, for a variable x we let (G,x)={(G,x,i):i[d(x)]} and analogously for aF we set (G,a)={(G,a,j):j[d(a)]}. To economise notation we sometimes identify a clone (x, i) with the underlying variable x. For instance, if σ:VΩ is an assignment, then we take the liberty of writing σ(x,i)=σ(x). Additionally, where convenient we view (G,x) as the set of all constraint nodes aF such that there exist i[d(x)],j[d(a)] such that (a,j)=G(x,i). The corresponding convention applies to (G,a).

A M ‐assignment is a map σ:VΩ and we define

ψG,a(σ)=ψa(σ(G(a,1)),,σ(G(a,d(a))))foraF,and
ψG(σ)=aFψa(σ).

Further, the Gibbs distribution and the partition function of G are

μG(σ)=ψG(σ)/ZG,whereZ(G)=σ:VΩψG(σ). (3.4)

We denote expectations with respect to the Gibbs measure by ·G=·μG.

The fundamental problem that arises is the study of the random variable lnZ(G). As mentioned in Section 1, this random variable holds the key to getting a handle the Gibbs measure and thus the combinatorics of the problem. The following proposition establishes concentration about the expectation. For two factor graphs G,GG(M) let

dist(G,G)=|{(x,i)CV:(G,x,i)(G,x,i)}|. (3.5)

Proposition 3.2

For any

Δ,Ω,Θ,Ψ

there exists

η=η(Δ,Ω,Θ,Ψ)>0

such that for any

(Δ,Ω,Ψ,Θ)

‐model

M

of size

n=#M1/η

and any

ε>0

we have

P[|lnZ(G)E[lnZ(G)]|>εn]exp(ηε2n)

There exists a number ρ>0 that depends on Δ,Ω,Ψ,Θ only such that for any two factor graphs G,GG(M) we have |lnZ(G)lnZ(G)|ρ·dist(G,G). Therefore, the assertion follows from Azuma's inequality.

Thus, Proposition 3.2 reduces our task to calculating the expectation E[lnZ(G)]. Generally, the standard first and second moment method do not suffice to tackle this problem because the logarithm sits inside the expectation. While, of course, Jensen's inequality guarantees that

E[lnZ(G)]lnE[Z(G)], (3.6)

equality does not typically hold. In fact, we saw examples where lnE[Z(G)]E[lnZ(G)] is linear in the size #M of the model already. If so, then the Paley‐Zygmund inequality (1.3) entails that ln(E[Z(G)2]/E[Z(G)]2) is linear in #M as well, dooming the second moment method. Furthermore, even if E[lnZ(G)]lnE[Z(G)] the second moment method does not generally succeed 20. Let us now revisit the examples from Section 3.1.

Example 3.3

((the Ising model on the random d‐regular graph)). Suppose that d2,β>0. Let Δ=d,Ω={±1},Ψ={ψ}, where ψ:{±1}2(0,),(σ1,σ2)exp(βσ1σ2), and set Θ={0}. Further, given n1 such that dn is even we define a (Δ,Ω,Ψ,Θ)‐model M(d,n) by letting V={x1,,xn},F={a1,,adn/2}, d(x) = d for all xV, d(a) = 2 for all aF, t(x,i)=t(f,j)=0 for all (x,i)CV,(f,j)CF, and ψa=ψ for all aF. Thus, all clones have the same “type” and all constraint nodes have arity two and the same weight function. Hence, the random graph G(M) is obtained by matching the dn variable clones randomly to the dn constraint clones. If we simply replace the constraint nodes, which have degree two, by edges joining the two adjacent variable nodes, then the resulting random multigraph is contiguous to the uniformly random d‐regular graph on n vertices. In the model M (3.6) holds with (asymptotic) equality for all d,β 22.

Example 3.4

((the Potts antiferromagnet on the random d‐regular graph)). The construction is similar to the previous example, except that Ω=[k] is the set of colors and ψ(σ1,σ2)=exp(β1{σ1=σ2}). In this example (3.6) holds with asymptotic equality if either dd0(k) or d>d0(k) and ββ0(d,k) for certain critical values d0(k),β0(d,k). However, for sufficiently large d,β there occurs a linear gap 12, 21.

Example 3.5

((random k‐SAT)). To capture the random k‐SAT model we let Δ>0 be a maximum degree and Ω=Θ={±1}. Further, each s{±1}k gives rise to a function

ψs:{±1}k(0,),σexp(β1{σ=s})

and we let Ψ={ψs:s{±1}k}. The idea is that s is the “sign pattern” of a k‐clause, with si=±1 indicating that the ith literal is positive/negative. Then a truth assignment σ of the k variables is satisfying unless σi=si for all i. The corresponding model M has a set V={x1,,xn} of Boolean variables and a set F={a1,,am} of clauses. Moreover, the map d:V[Δ] prescribes the degree of each variable, while of course each clause has degree k. Additionally, the map t:CVCFΘ={±1} prescribes the positive/negative occurrences of the variables and the sign patterns of the clauses. Thus, a variable x occurs |{i[d(v)]:t(x,i)=±1}| times positively/negatively and the jth literal of a clause a is positive iff t(a,j)=1. Finally, the weight function of clause a is ψ(t(a,1),,t(a,k)). The bound (3.6) does not generally hold with equality 5, 11.

While Definition 3.1 encompasses many problems of interest, there are two restrictions. First, because all weight functions ψΨ take strictly positive values, Definition 3.1 does not allow for “hard” constraints. For instance, Definition 3.1 does not accommodate the graph coloring problem, which imposes the strict requirement that no single edge be monochromatic. However, for some purposes hard constraints can be approximated by soft ones, e.g., by choosing a very large value of β in the Potts antiferromagnet. Moreover, many of the arguments in the following sections do extend to hard constraints with a bit of care. However, the assumption that all ψ are strictly positive saves us many case distinctions as it ensures that Z(G) is strictly positive and that therefore the Gibbs measure is well‐defined.

The second restriction is that we prescribe a fixed maximum degree Δ. Thus, if we consider a sequence M_=(Mn)n of (Δ,Ω,Ψ,Θ)‐models with #Mn=n, then all factor graphs have a bounded degree. By comparison, if we choose a k‐SAT formula with n variables and m=αn/k clauses uniformly at random for fixed k3,α>0, then the maximum variable degree will be of order lnn/lnlnn. Yet this case can be approximated well by a sequence of models with a large enough maximum degree Δ. In fact, if we calculate E[lnZ] for any fixed Δ, then the Δ limit is easily seen to yield the answer in the case of uniformly random formulas. Nevertheless, the bounded degree assumption is technically convenient because it facilitates the use of local weak convergence, as we will discuss next.

Remark 3.6

For the sake of simplicity in (3.4) we definied the partition function as the sum over all σ:VΩ. However, the results stated in the following carry over to the cases where Z is defined as the sum over all configurations of a subset of CMΩV, e.g., all σ that have Hamming distance at most αn from some reference assignment σ 0 for a fixed α>0. Of course, in this case the Gibbs measure is defined such that its support is equal to CM.

3.3. Local Weak Convergence

Suppose that we fix Δ,Ω,Ψ,Θ as in Definition 3.1 and that M_=(Mn)n is a sequence of (Δ,Ω,Ψ,Θ)‐models such that Mn=(Vn,Fn,dn,tn,(ψa)aFn) has size n. Let us write G=G(Mn) for the sake of brevity. According to the cavity method, limn1nE[lnZ(G)] is determined by the “limiting local structure” of the random factor graph G. To formalise this concept, we adapt the concept of local weak convergence of graph sequences 8, 35 to our current setup, thereby generalising the approach taken in 23.

Definition 3.7

A (Δ,Ω,Ψ,Θ) template consists of a (Δ,Ω,Ψ,Θ) ‐model M, a connected factor graph HG(M) and a root rH, which is a variable or factor node. Its size is #M. Moreover, two templates H,H with models M=(V,F,d,t,(ψa)), M=(V,F,d,t,(ψa)) are isomorphic if there exists a bijection π:VFVF such that

  • 1

    ISM1: π(rH)=rH,

  • 2

    ISM2: π(V)=V and π(F)=F,

  • 3

    ISM3: d(v)=d(π(v)) for all vVF,

  • 4

    ISM4: t(v,i)=t(π(v),i) for all (v,i)CVCF,

  • 5

    ISM5: ψa=ψπ(a) for all aF, and

  • 6

    ISM6: if (v,i)CV,(a,j)CF satisfy (G,x,i)=(a,j), then (G,π(x),i)=(π(a),j)

Thus, a template is, basically, a finite or countably infinite connected factor graph with a distinguished root. Moreover, an isomorphism preserves the root as well as degrees, types, weight functions and adjacencies.

Let us write [H] for the isomorphism class of a template and let G=G(Δ,Ω,Θ,Ψ) be the set of all isomorphism classes of (Δ,Ω,Ψ,Θ)‐templates. For each [H]G and 1 let [H] be the isomorphism class of the template obtained by removing all vertices at a distance greater than from the root. We endow G with the coarsest topology that makes all the functions

ΓG1{[Γ]=[Γ0]}{0,1}for1,Γ0G

continuous. Moreover, the space P(G) of probability measures on G carries the weak topology. So does the space P2(G) of probability measures on P(G). For ΓG we write δΓP(G) for the Dirac measure that puts mass one on the single point Γ. Similarly, for λP(G) we let δλP2(G) be the Dirac measure on λ. Our assumption that the maximum degree is bounded by a fixed number Δ ensures that G,P(G),P2(G) are compact Polish spaces.

For a factor graph GG(Mn) and a variable or constraint node v we write [G,v] for the isomorphism class of the connected component of v in G rooted at v. Then each factor graph GG(Mn) gives rise to the empirical distribution

λG=1|Vn|+|Fn|vVnFnδ[G,v]P(G).

We say that M_ converges locally to ϑP(G) if

limnE[δλG]=δϑ. (3.7)

Denote a random isomorphism class chosen from the distribution ϑ by T=Tϑ. Unravelling the definitions, we see that (3.7) holds iff for every integer >0 and every [H]G we have

1|Vn|+|Fn|vVnFn1{[G,v]=[H]}nP[Tϑ=[H]]inprobability. (3.8)

We are going to be interested in the case that M_ converges locally to a distribution ϑ on acyclic templates. Thus, let T be the set of all acyclic templates. Further, we write V for the set of all templates whose root is a variable node and F for the set of all templates whose root is a constraint node. Additionally, for a template [H] we write r[H] for the root vertex, d[H] for its degree and ψ[H] for the weight function of the root vertex if [H]F. Moreover, for j[d[H]] we write [H]j for the template obtained from [H] by re‐rooting the template at the jth neighbor of r[H]. (This makes sense because condition ISM6 from Definition 3.7 preserves the order of the neighbors.)

We will frequently condition on the depth‐ neighborhoods of the random factor graph G for some finite . Hence, for G,GG(Mn) and 1 we write GG if [G,x]=[G,x] for all variable nodes xVn and +1[G,a]=+1[G,a] for all constraint nodes aFn. Let T=T,Mn be the σ‐algebra on G(Mn) generated by the equivalence classes of the relation . Additionally, for GG(Mn) and 0 we let

λG,=1|Vn|+|Fn|[xVnδ[G,x]+aFnδ+1[G,a]]

be the empirical distribution of the depth‐ neighborhood structure.

Furthermore, let

T={T:TTV}{+1T:TTF}.

Then for a probability measure ϑP(T) we denote by ϑ the image of ϑ under the map

TT,T {TifTTV,+1TifTTF.

Because all degrees are bounded by Δ, the set T is finite for every 1. Hence, (3.8) entails that M_ converges locally to ϑP(T) iff

limnEλG,ϑTV=0forevery1. (3.9)

3.4. The Planted Distribution

While G is chosen uniformly at random (from the configuration model), we need to consider another distribution that weighs factor graphs by their partition function. Specifically, given 0 let G^=G^,Mn be a random graph chosen according to the distribution

P[G^=G]=Z(G)·E[1{G=G}E[Z|T]](GG(Mn)), (3.10)

which we call the planted distribution. The definition (3.10) ensures that the distribution of the “depth‐ neighborhood structure” of G^ coincides with that of G.

Perhaps more intuitively, the planted distribution can be described by the following experiment. First, choose a random factor graph G. Then, given G, choose the factor graph G^ randomly such that a graph GG comes up with a probability that is proportional to Z(G). Perhaps despite appearances, the planted distribution is reasonably easy to work with in many cases. For instance, it has been employed successfully to study random k‐SAT as well as random graph or hypergraph coloring problems 1, 11, 13, 20, 26.

3.5. Short Cycles

In most cases of interest the random factor graph is unlikely to contain many short cycles, and it will be convenient for us to exploit this fact. Hence, let us call a factor graph G l‐acyclic if it does not contain a cycle of length at most l. We say that the sequence M_ of models has high girth if for any ,l>0 we have

liminfnP[Gislacyclic]>0,liminfnP[G^islacyclic]>0. (3.11)

Thus, there is a non‐vanishing probability that the random factor graph G is l‐acyclic. Moreover, short cycles do not have too heavy an impact on the partition function as the graph chosen from the planted distribution has a non‐vanishing probability of being l‐acyclic as well.

In the following, we are going to denote the event that a random factor graph is l‐acyclic by Al. Let us highlight the following consequence of the high girth condition and the construction of the planted distribution.

Proposition 3.8

Assume that M_ is a sequence of (Δ,Ω,Ψ,Θ) ‐models of high girth. Let 1 be an integer and suppose that is an event such that limnP[G^]=1. If b is a real and l0 is an integer such that

limnP[lnE[Z(G)|T]bn|Al]=1, (3.12)

then limn1nlnE[1{Al}Z(G)]b

Since limnP[G^]=1 the high girth condition (3.11) implies that limnP[G^|Al]=1 for every l. Set l=Al. Then by the definition (3.10) of the planted distribution,

1o(1)=P[G^|Al]=GlZ(G)E[1{G=G}E[Z|T]|Al]=E[1{Gl}Z(G)E[Z|T]|Al]
=E[E[1{Gl}Z|T]E[Z|T]|Al].

Consequently, P[E[1{Gl}Z]|T]E[Z|T]/2|Al]=1o(1). Hence, (3.12) yields

P[lnE[1{Gl}Z]|T]bn1|Al]=1o(1).

Therefore, the assertion follows from (3.11).

Remark 3.9

Strictly speaking, the first condition in (3.11) is superfluous as it is implied by the second one.

From here on out we assume that M_ is a sequence of (Δ,Ω,Ψ,Θ) ‐models of high girth that converges locally to ϑP(T) and we fix Δ,Ω,Θ,Ψ for the rest of the paper.

4. THE BETHE FREE ENERGY

In this section we present the main results of the paper. The thrust is that certain basic properties of the Gibbs measure entail an asymptotic formula for E[lnZ(G)]. The results are guided by the physics predictions from 34.

4.1. An Educated Guess

The formula for E[lnZ(G)] that the cavity method predicts, the so‐called “replica symmetric solution”, comes in terms of the distribution ϑ to which M_ converges locally. Thus, the cavity method claims that in order to calculate E[lnZ(G)] it is not necessary to deal with the mind‐boggling complexity of the random factor graph with its expansion properties, long cycles etc. Instead, it suffices to think about the random tree T=Tϑ, a dramatically simpler object. The following definition will help us formalise this notion.

Definition 4.1

A marginal assignment is a measurable map

p:Tj=1ΔP(Ωj),TpT

such that

  • 1

    MA1: pTP(Ω) for all TV,

  • 2

    MA2: pTP(ΩdT) and pTj=pTj for all TF,j[dT],

  • 3
    MA3: For all TF we have
    H(pT)+lnψT(σ)pT=max{H(ν)+lnψT(σ)ν:ν𝒫(ΩdT)s.t.νj=pTjforallj[dT]}.
      (4.1)
    Further, the Bethe free energy of p with respect to ϑ is
    ϑ(p)=E[(1dT)H(pT)|V]+P[TF]P[TV]E[H(pT)+lnψT(σ)pT|F], (4.2)
    where, of course, E[·],P[·]E[·],P[·] refer to the choice of the random tree T=Tϑ

Thus, a marginal assignment provides a probability distribution p T on Ω for each tree whose root is a variable node. Furthermore, for trees T rooted at a contraint node p T is a distribution on ΩdT, which we think of as the joint distribution of the variables involved in the constraint. The distributions assigned to T rooted at a constraint node must satisfy a consistency condition: the jth marginal of p T has to coincide with the distribution assigned to the tree Tj rooted at the jth child of the root of T for every j[dT]; of course, Tj is a tree rooted at a variable node. In addition, MA3 requires that for TF the distribution p T maximises the functional H(ν)+ψT(σ)ν amongst all distribution ν with the same marginal distributions as p T. Furthermore, the Bethe free energy is a functional that maps each marginal assignment p to a real number. For a detailed derivation of this formula based on physics intuition we refer to 36.

The basic idea behind Definition 4.1 is to capture the limiting distribution of the marginals of the variables of the random factor graph G. More specifically, Definition 4.1 aims to provide the “limiting object” of the following combinatorial construction: for a fixed take a random factor graph G on a large enough number n of variable nodes and for each possible tree TT record the empirical distribution of the marginals of the nodes whose neighborhood is isomorphic to T. In the simplest possible case (which here we confine ourselves to), in the limit of large ,n we expect to obtain distributions that satisfy MA2. That is, in the limit of large the empirical distribution of the marginals of variable nodes converges to a deterministic limit; going from to +1 (the depth up to which constraint nodes can “see”) does not make much of a difference. Moreover, in the proof of Corollary 4.8 we are going to see that MA3 is the “correct” way of linking the constraint/variable distributions.

Given a distribution ϑ on trees, the cavity method provides a plausible recipe for constructing marginal assignments. Roughly speaking, the idea is to identify fixed points of an operator called Belief Propagation on the random infinite tree 36. However, this procedure is difficult to formalise mathematically because generally there are several Belief Propagation fixed points and model‐dependent considerations are necessary to identify the “correct” one. To keep matters as simple as possible we are therefore going to assume that a marginal assignment is given.

Remark 4.2

Because the entropy is concave, conditions MA2 and MA3 specify the distributions p T for TF uniquely. In other words, a marginal assignment is actually determined completely by the distributions p T for TV.

For a marginal assignment p, an integer and a tree TTV we define

p,T=E[pT|T=T].

Thus, p is the conditional expectation of p given the first layers of the tree. To avoid notational hazards we let pT,p,T be the uniform distribution on Ω for all TGT.

Lemma 4.3

For any

ε>0

there is

0>0

such that for all

>0

we have

E[p,TpTTV|TV]<ε.

Define an equivalence relation on TV by letting TT iff T=T. Then for any ωΩ the sequence of random variables X(T)=p,T(ω) is a martingale with respect to the filtration generated by the equivalence classes of . By the martingale convergence theorem 27, Theorem 5.7], (p) converges ϑ‐almost surely to p.

Unless specified otherwise, in the rest of this section p is understood to be a marginal assignment.

4.2. Symmetry

In the terminology of Section 2, the cavity method claims that 1nE[lnZ(G)] converges to the Bethe free energy of a suitable marginal assignment iff

limnP[μGis(ε,2)symmetric]=1foranyε>0(see[34]). (4.3)

This claim is, of course, based on bold non‐rigorous deliberations. Nonetheless, we aim to prove a rigorous statement that comes reasonably close.

To this end, let p be a marginal assignment. We say that M_ is p‐symmetric if for every ε>0 there is 0>0 such that for all >0 we have

limnP[1n2x,yVnμG{x,y}p,[G,x]p,[G,y]TV>ε]=0. (4.4)

In other words, for any ε>0 for sufficiently large the random factor graph G enjoys the following property with high probability. If we pick two variable nodes x, y of G uniformly and independently, then the joint distribution μG{x,y} is close to the product distribution p,[G,x]p,[G,y] determined by the depth‐ neighborhoods of x, y. Of course, as G has bounded maximum degree the distance between randomly chosen x, y is going to be greater than, say, lnlnn with high probability. Thus, similar in spirit to (4.3), (4.4) provides that far‐apart variables typically decorrelate and that p captures the Gibbs marginals.

In analogy to (4.4), we say that the planted distribution of M_ is p‐symmetric if for every ε>0 there is 0>0 such that for all >0 we have

limnP[1n2x,yVnμG^{x,y}p,[G^,x]p,[G^,y]TV>ε]=0foranyε>0.

The main result of this paper is

Theorem 4.4

If

M_

is p‐symmetric, then

limsupn1nE[lnZ(G)]ϑ(p).

If the planted distribution of

M_

is p‐symmetric as well, then

limn1nE[lnZ(G)]=ϑ(p).

Thus, the basic symmetry assumption (4.4) implies that ϑ(p) is an upper bound on 1nE[lnZ(G)]. If, additionally, the symmetry condition holds in the planted model, then this upper bound is tight. In particular, in this case 1nE[lnZ(G)] is completely determined by the limiting local structure ϑ and p.

The proof of Theorem 4.4, which can be found in Section 4.6, is based on Theorem 2.1, the decomposition theorem for probability measures on cubes. More precisely, we combine Theorem 2.1 with a conditional first and a second moment argument given the local structure of the factor graph, i.e., given T for a large . The fact that it is necessary to condition on the local structure in order to cope with “lottery effects” has been noticed in prior work 6, 17, 22, 23. Most prominently, such a conditioning was crucial in order to obtain the precise k‐SAT threshold for large enough k 26. But here the key insight is that Theorem 2.1 enables us to carry out conditional moment calculations in a fairly elegant and generic way.

The obvious question that arises from Theorem 4.4 is whether there is a simple way to show that M_ is p‐symmetric (and that the same is true of the planted distribution). In Sections 4.3 and 4.4 we provide two sufficient conditions called non‐reconstruction and Gibbs uniqueness. That these two conditions entail symmetry was predicted in 34, and Theorem 2.1 enables us to prove it.

While the present paper deals with a very general class of factor graphs, the methods give somewhat stronger results in special classes, e.g., models with only one type and variable all nodes of the same degrees or variable nodes with Poisson degrees. The details have been worked out in 18, 19.

4.3. Non‐reconstruction

Following 34 we define a correlation decay condition, the “non‐reconstruction” condition, on factor graphs and show that it implies symmetry. The basic idea is to formalise the following. Given ε>0 pick a large =(ε)>1, choose a random factor graph G for some large n and pick a variable node x uniformly at random. Further, sample an assignment σ randomly from the Gibbs measure μG. Now, sample a second assignment τ from μG subject to the condition that τ(y)=σ(y) for all variable nodes y at distance at least from x. Then the non‐reconstruction condition asks whether the distribution of τ(x) is markedly different from the unconditional marginal μGx. More precisely, non‐reconstruction occurs if for any ε there is (ε) such that with high probability G is such that the shift that a random “boundary condition” σ induces does not exceed ε in total variation distance.

Of course, instead of conditioning on the values of all variables at distance at least from x, we might as well just condition on the variables at distance either or +1 from x, depending on the parity of . This is immediate from the definition (3.4) of the Gibbs measure.

As for the formal definition, suppose that GG(Mn) is a factor graph, let xVn and let 1. Let (G,x) signify the σ‐algebra on Ωn generated by the events 1{σ(y)=ω} for ωΩ and yVn at distance either or +1 from x. Thus, (G,x) pins down all σ(y) for y at distance from x if is even and +1 otherwise. Then we say that M_ has non‐reconstruction with respect to a marginal assignment p if for any ε>0 there is >0 such that

limnP[1nxVnτ[·|x]|(G,x)Gp,[G,x]TVG>ε]=0.

To parse the above, the outer P[·] refers to the choice of G. The big ·G is the choice of the boundary condition called σ above. Finally, ·|(G,x)G is the random choice given the boundary condition.

Analogously, the planted distribution of M_ has non‐reconstruction with respect to p if for any ε>0 there exists >0 such that

limnP[1nxVnσ[·|x]|(G^,x)G^p,[G^,x]G^>ε]=0.

Theorem 4.5

If M_ has non‐reconstruction with respect to p, then M_ is p‐symmetric. If the planted distribution of M_ has non‐reconstruction with respect to p, then it is p‐symmetric.

In concrete applications the non‐reconstruction condition is typically reasonably easy to verify. For instance, in 11 we determine the precise location of the so‐called “condensation phase transition” in the regular k‐SAT model via Theorems 4.4 and 4.5. The proof of Theorem 4.5 can be found in Section 4.7.

4.4. Gibbs Uniqueness

Although the non‐reconstruction condition is reasonably handy, to verify it we still need to “touch” the complex random graph G. Ideally, we might hope for a condition that can be stated solely in terms of the limiting distribution ϑ on trees, which is conceptually far more accessible. The “Gibbs uniqueness” condition as put forward in 34 fills this order.

Specifically, suppose that T is a finite acyclic template whose root r T is a variable node. Then we say that T is (ε,) ‐unique with respect to a marginal assignment p if

σ[·|rT]|TTpTTV<ε. (4.5)

To parse (4.5), we observe that σ[·|rT]|TT is a random variable, namely the average of the value σ[·|rT] assigned to the root variable under the Gibbs measure μ T given the values of the variables at distance at least from r T. Hence, (4.5) requires that σ[·|rT]|TT is at total variation distance less than ε for every possible assignment of the variables at distance at least from r T, i.e., for every “boundary condition”.

More generally, we say that TTV is (ε,)‐unique with respect to p if the finite template +1T has this property. (That +1T is finite follows once more from the fact that all degrees are bounded by Δ.) Further, we call the measure ϑP(T) Gibbs‐unique with respect to p if for any ε>0 we have

limP[Tis(ε,)uniquew.r.t.p]=1.

Corollary 4.6

If

ϑP(T)

is Gibbs‐unique with respect to p, then

limn1nE[lnZ(G)]=ϑ(p)

If ϑ is Gibbs‐unique with respect to p, then (3.9) guarantees that M_ has non‐reconstruction with respect to p. Indeed, given ε>0,>0 and a graph G let (G,ε,) denote the set of vertices xVn for which [G,x] is acyclic and (ε,)‐unique. Then we have

1nxVnσ[·|x]|(G,x)Gp,[G,x]TVG1nxVnσ[·|x]|(G,x)Gp,[G,x]TVε+(1|(G,ε,)|n),
ε+(1|(G,ε,)|n),

and by (3.9) P[|(G,ε,)|(1ε)n] tends to 0 as n. Similarly, because the distribution of the depth‐ neighborhood structure in the planted distribution G^ coincides with ϑ, Gibbs‐uniqueness implies that the planted model has non‐reconstruction with respect to p as well. Therefore, the assertion follows from Theorems 4.4 and 4.5.

In problems such as the random k‐SAT model, the Ising model or the Potts antiferromagnet that come with an “inverse temperature” parameter β0, Gibbs uniqueness is always satisfied for sufficiently small values of β. Consequently, Corollary 4.6 shows that the cavity method always yields the correct value of limn1nE[lnZ(G)] in the case of small β, the so‐called “high temperature” case in physics jargon. Furthermore, if the Gibbs uniqueness condition is satisfied then there is a canonical way of constructing the marginal assignment p by means of the Belief Propagation algorithm 36, Chapter 14]. Hence, Corollary 4.6 provides a full comprehensive answer in this case.

4.5. Meet the Expectation

In this section we lay the groundwork for proving Theorem 4.4. In particular, the conditions MA1–MA3 will be used in the proofs of Corollaries 4.8 and 4.9 in this section, which will be vital to the proof of Theorem 4.4 in Section 4.6. To proceed, we need to get a handle on the conditional expectation of Z given T and for this purpose we need to study the possible empirical distributions of the values assigned to the variables of a concrete factor graph GG(Mn). Specifically, by a (G,)‐marginal sequence we mean a map q:Tj=1ΔP(Ωj),TqT such that

  • 1

    MS1: qTP(Ω) if TVT,

  • 2

    MS2: qTP(ΩdT) if TFT,

  • 3
    MS3: for all TTV we have
    TTFj[dT]λG,(T)1{[Tj]=T}(qTjqT)=0. (4.6)

Thus, q assigns each tree TT rooted at a variable node a distribution on Ω and each tree TT rooted at a constraint node a distribution on ΩdT, just like in Definition 4.1. Furthermore, the consistency condition (4.6) provides that for a given T rooted at a variable the average marginal distribution over all T,j such that [Tj]=T is equal to q T. However, in contrast to condition MA2 from Definition 4.1 MS3 does not require this marginalisation to work out for every T,j individually.

Suppose now that UFn is a set of constraint nodes such that d(a)=d0 for all aU. Then for σ:VnΩ we let

σ[(ω1,,ωd0)|U]=1|U|aUj=1d01{σ((G,a,j))=ωj)}.

Thus, σ[·|U]𝒫(Ωd0) is the empirical distribution of the sequences {(σ((G,a,1)),,σ((G,a,d0))):aU}. A factor graph G and σ:VnΩ induce a (G,)‐marginal sequence qG,σ, canonically, namely the empirical distributions

qG,σ,,T=σ[·|{xVn:[G,x]=T}]forTTV,
qG,σ,,T=σ[·|{aFn:+1[G,a]=T}]forTT.

Conversely, given a (G,)‐marginal sequence q let Σ(G,,q,δ) be the set of all σ:VnΩ such that for all TTV,TTF we have

qG,σ,,TqTTVδ,qG,σ,,TqTTVδ. (4.7)

Moreover, let

Z,q,δ(G)=Z(G)1{σΣ(G,,q,δ)}G.

Finally, define

G,(q)=TTV(1dT)H(qT)λG,(T|V)+|Fn||Vn|TTF[H(qT)+lnψT(σ)qT]λG,(T|F).

In Section 5 we are going to prove the following formula for the expectation of Z,q,δ(G).

Proposition 4.7

For any

ε>0,>0

there is

δ>0

such that for large enough n the following is true. Assume that

GG(Mn)

is

100

‐acyclic and let q be a

(G,)

‐marginal sequence. Then

|n1lnE[1{A2+5}Z,q,δ(G)|GG]G,(q)|<ε.

We are going to be particularly interested in the expectation of Z,q,δ(G) for q “close” to a specific marginal assignment p (in the sense of Definition 4.1). Formally, a (G,)‐marginal sequence q is (ε,)‐judicious with respect to p if

TTVλG,[T|V]qTp,TTV+TTFj[dT]λG,[T|F]qTjp,[Tj]TV<ε.

We say that (G,σ) is (ε,)judicious with respect to p if the empirical distribution qG,σ, is (ε,)‐judicious w.r.t. p.

Let us explain this definition briefly. Suppose we are given a factor graph G and a certain “depth” . The for an assignment σ we can jot the empirical distribution of the values assigned to the variables x with [G,x]T for each tree TTV. The “judicious” condition essentially provides that these empirical distributions are fairly “homogeneous”. That is, if we refine our classification of variable nodes according to the depth‐(+1) structure +1[G,x], then the resulting empirical distributions are close to the coarser ones obtained at level . Clearly, this condition is closely related to the MA2 condition from Definition 4.1.

Before we prove Theorem 4.4 we deduce two corollaries to Proposition 4.7. The first one gives an upper bound on the “judicious part” of E[Z(G)]. The second one yields a lower bound.

Corollary 4.8

Suppose that p is a marginal assignment. For any α>0 there exist ε>0,>0 such that for all 0<β,γ<ε and all l the following is true. Let (γ,l) be the event that λG,lϑlTV<γ. Then

limsupn1nlnE[1{G(γ,l)A100l}Z(G)1{(G,σ)is(β,l)-judiciousw.r.t.p}G]
ϑ(p)+α.

Pick a small enough ε=ε(α)>0. By Lemma 4.3 there exists such that E[pl,lTpTTV|V]<ε for all l. Now, fix any 0<β,γ<ε,l, pick ξ=ξ(β,l) small enough and assume that n is big enough. Let Q(G) be the set of all (G, l)‐marginal sequences that are (β,l)‐judicious w.r.t. p. Because Tl is a finite set, there exists a number N=N(ξ) such that for every factor graph G there is a subset Q*(G)Q(G) of size |Q*(G)|N such that the following is true. If (G,σ) is (β,l)‐judicious w.r.t. p, then σqQ*(G)Σ(G,l,q,ξ). Therefore, for all G we have

Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p}GNmaxqQ(G)Z,q,ξ(G). (4.8)

Proposition 4.7 and (4.8) imply that for ξ small enough and n large enough for any factor graph GA100 there is qGQ(G) such that

n1lnE[1{A100}Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p}G|GG]G,l(qG)+α/2α/2+TTV(1dT)H(qTG)λG(T|V)+|Fn||Vn|TT[H(qTG)+lnψT(σ)qTG]λG(T|). (4.9)
 
 

Further, for any j[Δ] the function νP(Ωj)H(ν) is uniformly continuous because P(Ωj) is compact. By the same token, νlnψ(σ)ν is uniformly continuous for any ψΨ. Consequently, if G(γ,l) for some γ<ε and ε is chosen small enough, then we obtain

TTV(1dT)H(qTG)λG(T|V)<E[(1dT)H(pT)|V]+α/4. (4.10)

Similarly, because our choice of l ensures that E[pl,lTpTTV|V]<ε and because of the uniform degree bound Δ, condition MA2 from Definition 4.1 implies that

E[j=1dTpl,l[Tj]pTjTV|F]<ε1/4.

Moreover, for any ψΨ and any ν1,,νdψP(Ω) there is a unique distribution ν^P(Ωdψ) with marginals ν^j=νj that maximises H(ν^)+ψ(σ)ν^ because the entropy is concanve and the map μψ(σ)μ is linear. In fact, the map (ν1,,νdψ)ν^ is uniformly continuous. Therefore, MA2 and MA3 show that for G(γ,l),

|Fn||Vn|TTF[H(qTG)+lnψT(σ)qTG]λG(T|F)<E[H(pT)+ψT(σ)pT|F]+α/4. (4.11)

Combining (4.9), (4.10) and (4.11), we get

lnE[1{A100}Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p}G|GG]ϑ(p)+αn. (4.12)

Finally, the assertion follows from (4.12) and Bayes’ rule.

Corollary 4.9

Suppose that p is a marginal assignment. For any α>0 there exists >0 such that for all l,

limnP[1nlnE[Z(G)|Tl]ϑ(p)α|A100l]=0.

Choose a small ε=ε(α)>0. By Lemma 4.3 there exists such that E[pl,lTpTTV|V]<ε for all l. Hence, fix some l and define

q:TTlVP(Ω),Tpl,lT.

Moreover, for TTlF let qTP(ΩdT) be the (unique) distribution that maximises H(qT)+lnψT(σ)qT subject to the condition that qTj=qlTj for all j[dT] (cf. (4.1)). Then q is a marginal sequence. Indeed, MS1, MS2 are trivially satisfied and MS3 holds because qTj=qlTj for all TTlF,j[dT]. Further, if we pick δ=δ(ε,l)>0 small enough, then Proposition 4.7 implies that for large n and any GA100l

n1lnE[Zl,q,δ(G)|GG]G,(q)α/2=α/2+TTV(1dT)H(qT)λG,l(T|V)+|Fn||Vn|TT[H(qT)+lnψT(σ)qT]λG,l(T|). (4.13)
 

To complete the proof, we need to compare the r.h.s. of (4.13) with ϑ(p). Thus, let us write

β(G)=TTV(1dT)H(qT)λG,l(T|V),
β(G)=TTF[H(qT)+lnψT(σ)qT]λG,l(T|F).

Because ϑlλG,lTV<ε w.h.p. by (3.9), E[pl,lTpTTV|V]<ε by our choice of l, the entropy is a uniformly continuous on P(Ω) and dTΔ is uniformly bounded for all T, (3.11) ensures that we can make ε so small that

limnP[E[|β(G)E[(1dT)H(pT)|V]||Tl]>α/4|A100l]=0. (4.14)

A similar argument applies to β(G). Indeed, since E[pl,lTpTTV|V]<ε and because all degrees are bounded by Δ, condition MA2 from Definition 4.1 implies that

E[j=1dTpl,l[Tj]pTjTV|F]<ε1/4

provided ε is small enough. In effect, because for any ψΨ the function μP(Ωdψ)H(μ)+ψ(σ)μ is uniformly continuous, MA3 and the construction of q T for TTlF ensure that

E[|H(ql,l+1T)+ψT(σ)ql,l+1TH(pT)ψT(σ)pT||F]<α/8. (4.15)

Since ϑlλG,lTV<ε w.h.p. by (3.9), (4.15) implies that

limnP[E[|β(G)E[H(pT)+ψT(σ)pT|F]||Tl]>α/4|A100l]=0. (4.16)

Finally, the assertion follows from (4.13), (4.14) and (4.16).

4.6. Proof of Theorem 4.4

We begin by spelling out the following consequence of the symmetry assumption. Let p be a marginal assignment.

Lemma 4.10

If

M_

is p‐symmetric, then for any

ε>0

for all sufficiently large

we have

limnP[xVnμGxp,[G,x]TV>εn]=limnP[μGfailstobe(ε,2)-symmetric]=0and (4.17)
 
limnP[xVnμG^xp,[G^,x]TV>εn]=limnP[μG^failstobe(ε,2)-symmetric]=0. (4.18)
 

Choose η=η(ε)>0 small enough. For an integer >0 consider the event

={x,yVnμG{x,y}p,[G,x]p,[G,y]TV<η2n2}

If M is p‐symmetric, then limnP[G]=1 for sufficiently large . Similarly, if the planted distribution is p‐symmetric, then limnP[G^]=1 for large .

Hence, assume that G. Then by the triangle inequality, for any ωΩ,

1nxVn|p,[G,x](ω)μGx(ω)|
=1n2xVn|[yVnωΩp,[G,x](ω)p,[G,y](ω)][yVnωΩμGx,y(ω,ω)]|η2.

Therefore,

1nxVnp,[G,x]μGxTVη2|Ω|<η. (4.19)

Furthermore, by (4.19) and the triangle inequality,

1n2x,yVnμGxμGyp,[G,x]p,[G,y]TV2η. (4.20)

Since G, (4.20) entails that

1n2x,yVnμGxμGyμG{x,y}TV3η<ε,

i.e., G is (ε,2)‐symmetric.

Together with Lemma 4.10 the following lemma shows that under the assumptions of Theorem 4.4 the partition function is dominated by its judicious part.

Lemma 4.11

There is a number

ε0=ε0(Δ,Ω,Ψ,Θ)

such that for all

0<ε<ε0,>0

there exists

χ>0

such that for large enough n the following is true. If

GG(Mn)

is a

(2+5)

‐acyclic factor graph such that

xVnμGxp,[G,x]TV<ε3n (4.21)

and μG is

(χ,2)

‐symmetric, then

1{(G,σ)is(ε,)-judiciousw.r.t.p}G1/2

Pick δ=δ(,ε)>0 small, β=β(δ) and γ=γ(β) smaller and χ=χ(γ)>0 smaller still and assume that n>n0(χ). Let V0 be the partition of V n such that x,yVn belong to the same class iff +2[G,x]=+2[G,y]. By Theorem 2.1 there exists a refinement V of V0 such that μ G is γ‐homogeneous with respect to (V,S) for some partition S of Ωn such that #V+#SN=N(γ). We may index the classes of V as VT,i with T=+2[G,x] for all x in the class and i[NT] for some integer N T.

Let J be the set of all j[#S] such that μ(Sj)δ6/N and μ[·|Sj] is γ‐regular. Then

jJμ(Sj)1δ6. (4.22)

Choosing χ small enough, we obtain from Corollary 2.4 that

1nxVnμGx[·|Sj]μGxTV<δ7foralljJ.

Therefore, by (4.21) and the triangle inequality, for jJ we get

1nT,i|VT,i|σ[·|VT,i]p,TTVG=1nT,ij[#S]|VT,i|μG(Sj)σ[·|VT,i]p,TTV|SjGδ7+1nT,ij[#S]|VT,i|μG(Sj)σ[·|VT,i]|SjGp,TTV[byHM2]δ7+1nT,ixVT,ij[#S]μG(Sj)μGx[·|Sj]p,[G,x]TV<3ε3.

Consequently, by (4.22), Bayes' rule and the triangle inequality, summing on   and   we get

  (4.23)
 
 
 

Applying the triangle inequality once more, we find

TTVλG,[T|V]qG,σ,,Tp,TTV1nT,i|VT,i|σ[·|VT,i]p,TTVG<3ε3. (4.24)

Further, consider TTF such that λG,[T|F]>0 and let j[dT]. Because G is (2+5)‐acyclic, there exists a set Γ(T,j)T+2V with the following two properties. First, for every constraint node a with +1[G,a]=T the variable node x=(G,a,j) satisfies +2[G,x]Γ(T,j). Second, for every variable node x with +2[G,x]Γ(T,j) there is a constraint node a with +1[G,a]=T such that (G,a,j)=x. For RΓ(T,j) let mR,T,i,j be the number of constraint nodes a with +1[G,a]=T such that x=(G,a,j) belongs to VR,i. Then by the triangle inequality,

TTj[dT]λG,[T|]qG,σ,,Tjp,[Tj]TVGTTj[dT]RΓ(T,j)i[NR]mR,T,i,j|Fn|σ[·|VR,i]p,RTVGΔ2nRT+2Vi[NR]|VR,i|σ[·|VR,i]p,RTVG; (4.25)
 
 

the last inequality follows because all degrees are between one and Δ. Finally, the assertion follows from (4.24) and (4.25).

We proceed by proving the upper bound and the lower bound statement from Theorem 4.4 separately. Strictly speaking, the proof of the lower bound implies the upper bound as well. But presenting the arguments separately makes them slightly easier to follow.

Proof of Theorem 4.4

upper bound. We assume that M_ is p‐symmetric. Pick and fix a number α>0 ; we aim to show that for large enough n,

1nE[lnZ(G)]ϑ(p)+4α. (4.26)

For ε,l>0 let

(ε,l)={xVnμGxpl,l[G,x]TV<εn}.

Additionally, let S(χ) be the event that μG is (χ,2)‐symmetric and let (ε,l) be the event that λG,lϑlTV<ε. Corollary 4.8 shows that for some small enough ε>0 and large enough (both dependent on α) for all l for large enough n we have

1nlnE[1{G(ε4,l)A100l}Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p}G]ϑ(p)+α. (4.27)

To apply this bound we are going to argue that Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p} is not much smaller than Z(G) for most G.

The proof of this fact is based on Lemma 4.11. To apply it, we need to pick and fix some specific, large enough *> (upon which the value of χ provided by Lemma 4.11 will depend). By Lemma 4.3 there is * such that

E[pl,lTpTTV|V]<ε4foralll*. (4.28)

Further, by Lemma 4.10 there is * such that

limnP[G(ε3,l)]=1foralll*.

Let *=+*+*. Now, Lemma 4.11 yields χ=χ(ε,*)>0 such that the following is true. Consider the event U=S(χ)(ε4,*)(ε3,*). Then

1{GUA100*}Z(G)21{GUA100*}Z(G)1{(G,σ)is(ε,*)-judiciousw.r.t.p}G. (4.29)

Combining (4.27) and (4.29) we obtain

E[1{GUA100*}Z(G)]2exp(n(ϑ(p)+α)). (4.30)

Hence, we are left to estimate the probability of the event UA100*. With respect to U we obtain from Lemma 4.10 that limnP[GS(χ)]=1 and limnP[G(ε3,*)]=1. Moreover, the local convergence assumption (3.9) implies that limnP[G(ε4,*)]=1. Consequently,

limnP[GU]=1. (4.31)

Hence, the high girth assumption (3.11) yields

liminfnP[GUA100*]>0. (4.32)

Finally, combining (4.30) and (4.32) and using Markov's inequality, we obtain

limnP[Z(G)>exp(n(ϑ(p)+2α))|UA100*]=0.

Further, since (4.32) shows that the probability of the event UA100* is bounded away from 0, Proposition 3.2 yields

limnP[Z(G)>exp(n(ϑ(p)+3α))]=0. (4.33)

Because |n1lnZ(G)| is bounded by some number C=C(Δ,Ω,Ψ,Θ)>0 by the definition (3.4) of Z, (4.26) follows from (4.33).

To establish the lower bound we introduce a construction reminiscent of those used in 24, 25, 31, 39, 48. Namely, starting from the sequence M_ of (Δ,Ω,Ψ,Θ)‐models, we define another sequence M_=(Mn)n of models as follows. Let Ω=Ω×Ω and let us denote pairs (ω,ω)Ω by ωω. Further, for any ψ:Ωh(0,) we define a function

ψ:(Ω)h(0,),(ω1ω1,,ωhωh)ψ(ω1,,ωh)·ψ(ω1,,ωh).

Let Ψ={ψ:ψΨ}. Then the (Δ,Ω,Ψ,Θ)‐model Mn=(Vn,Fn,dn,tn,(ψa)aFn) gives rise to the (Δ,Ω,Ψ,Θ)‐model Mn=(Vn,Fn,d,t,(ψa)aFn).

Clearly, there is a canonical bijection G(M)G(M),GG. Moreover, the construction ensures that the Gibbs measure μGP(Ωn) equals μGμG. Explicitly, for all ω1,ω1,,ωn,ωnΩ,

μG(ω1ω1,,ωnωn)=μG(ω1,,ωn)μG(ω1,,ωn). (4.34)

In effect, we obtain

Z(G)=Z(G)2. (4.35)

Further, writing G,T for the (Δ,Ω,Ψ,Θ)‐templates and the acyclic (Δ,Ω,Ψ,Θ)‐templates, we can lift the marginal assignment p from T to T by letting pT=pTpT for all T. Additionally, let ϑP(T) be the image of ϑ under the map TTTT so that

ϑ(p)=2ϑ(p). (4.36)

Proof of Theorem 4.4

lower bound. We assume that M_ is p‐symmetric and that the same is true of the planted distribution. For ε,l>0 consider the event

(ε,l)={1nxVnμGxpl[G,x]TV<ε}. (4.37)

and let S(χ) be the event that μG is (χ,2)‐symmetric. Moreover, as before let (ε,)={λG,ϑTV<ε}. Basically, we are going to apply the same argument as in the proof of the upper bound to the random factor graph G and to G^l for a large enough l.

Hence, let α>0. Then Corollary 4.8 applied to M_ yields a small ε=ε(α)>0 and a large =(α)>0 such that for all l and large enough n we have

1nlnE[1{G(ε4,l)A100l}Z(G)1{(G,σ)is(ε,l)-judiciousw.r.t.p}G]
ϑ(p)+α. (4.38)

Further, by Lemma 4.3 and (4.34) there exists * such that

E[pl,lTpTTV|V]+E[pl,lTpTTV|V]<ε4foralll*. (4.39)

Moreover, Lemma 4.10 shows that for some * we have

limnP[G(ε3,l)]=1foralll*. (4.40)

Similarly, (4.34), (4.39) the planted p‐symmetry assumption and Lemma 4.10 imply that there is * such that for any l>* for large enough l^ we have

limnP[G^l^(ε3,l)]=1. (4.41)

Additionally, Corollary 4.9 shows that for a certain * we have

limnP[1nlnE[Z(G)|Tl]ϑ(p)α|A100l]=0foralll*. (4.42)

Let *=*+*+*+*.

Applying Lemma 4.11 to M_, we obtain χ*=χ*(ε,*)>0 such that the following is true: let U=S(χ*)(ε4,*)(ε3,*) and define Z(G)=1{GUA100*}Z(G). Then (using (4.35))

Z(G)2=1{GUA100*}Z(G)
21{GUA100*}Z(G)1{(G,σ)is(ε,*)-judiciousw.r.t.p}G. (4.43)

Further, Proposition 2.5 and Lemma 4.10 imply that

limnP[GS(χ*)]=1. (4.44)

Combining (3.9), (4.40) and (4.44), we get

limnP[GU]=1. (4.45)

Now, (4.36), (4.38) and (4.43) give an upper bound on the second moment of Z, namely

E[Z(G)2]2E[1{G(ε4,*)A100*}1{(G,σ)is(ε,*)-judiciousw.r.t.p}GZ(G)]2exp(n(2ϑ(p)+α)). (4.46)
 
 

As a next step, we are going to show that

E[Z(G)]exp(n(ϑ(p)2α)). (4.47)

Indeed, by Proposition 2.5 and Lemma 4.10 we have

limnP[G^lS(χ*)]=1 (4.48)

for large enough l. Further, the local convergence assumption (3.9) and the construction (3.10) of the planted distribution ensures that for large enough l,

limnP[G^l(ε4,*)]=1.

Hence, (4.41) and (4.48), show that for l large enough

limnP[G^lU]=1. (4.49)

Thus, (4.42), (4.49) and Proposition 3.8 yield (4.47).

Finally, combining (4.46) and (4.47) and applying the Paley‐Zygmund inequality (1.3), we obtain for large n,

P[Z(G)exp(n(ϑ(p)4α))]P[Z(G)exp(n(ϑ(p)4α))]
E[Z(G)]22E[Z(G)2]exp(10αn).

Because this holds for any α>0, the assertion follows from Proposition 3.2.

4.7. Proof of Theorem 4.5

The key step of the proof is to establish the following statement.

Lemma 4.12

For any

ε>0

there exists

δ>0

such that for any

>0

there exists n0 such that for all

n>n0

the following is true. Assume that

GG(Mn)

satisfies

1nxVnσ[·|x]|(G,x)μp,[G,x]TVμ<δ9. (4.50)

Then G is

(ε,2)

‐symmetric and

xVnμGxp,[G,x]TV<εn

Before we prove Lemma 4.12 let us show how it implies Theorem 4.5.

Proof of Theorem 4.5

If GG(Mn) satisfies is (ε,2)‐symmetric and xVnμGxp,[G,x]TV<εn, then by the triangle inequality

x,yVnμG{x,y}p,[G,x]p,[G,y]TVx,yVnμG{x,y}μGxμGyTV
+μGxμGyp,[G,x]p,[G,y]TV
4εn2.

Therefore, the theorem follows by applying Lemma 4.12 either to the random factor graph G or to the random factor graph G^ chosen from the planted model.

Proof of Lemma 4.12

The proof is morally similar to one for the special case of the “stochastic block model” from 40. Let γ=γ(ε)>0 be sufficiently small. By Theorem 2.1 we can pick δ=δ(γ)>0 small enough so that there exists a partition (V,S) with #V+#S<δ1 with respect to which μ G is γ4‐homogeneous. Suppose that V i, S j are classes such that |Vi|δ3/2n,μG(Sj)δ3/2 and such that μ[·|Sj] is γ4‐regular on V i. We claim that

1|Vi|xViμGx[·|Sj]p,[G,x]TV<3γ. (4.51)

The assertion is immediate from this inequality. Indeed, suppose that (4.51) is true for all i, j such that |Vi|δ3/2n,μG(Sj)δ3/2 such that |Vi|δ3/2n,μG(Sj)δ3/2 is γ4‐regular on V i. Then because #V+#S1/δ

xμGx[·|Sj]p,[G,x]TV<4γn. (4.52)

Hence, by HM1 and Bayes’ rule, xμGxp,[G,x]TV<5γn<εn. Further, (4.52) and Lemma 2.8 imply that μ G is (ε,2)‐regular (provided that we pick γ small enough). Thus, we are left to prove (4.51).

Assume for contradiction that (4.51) is violated for V i, S j such that |Vi|δ3/2n,μG(Sj)δ3/2. Then by the triangle inequality there is a set WVi of size at least γ|Vi| such that for all xW we have

μGx[·|Sj]p,[G,x]TVγ.

For xW pick ωxΩ such that |μGx[ωx|Sj]p,[G,x]|γ is maximum. Then by the pigeonhole principle there exist ωΩ and WW,|W||W|/(2|Ω|), such that either

xW:μGx[ω|Sj]p,[G,x](ω)+γor (4.53)
xW:μGx[ω|Sj]p,[G,x](ω)γ (4.54)

In particular, for some ω we have

xW:μGx[ω|Sj]p,[G,x](ω)+γ/|Ω| (4.55)

We claim that there is a set LW of size |L|=1/δ with the following properties.

  • (i)

    the pairwise distance between any two x,yL is at least 10(+1).

  • (ii)
    for all xL we have
    σ[·|x]|(G,x)Gp,[G,x]TVμG<δ4. (4.56)

Indeed, because |Vi|δ2n and μ(Sj)δ2 the assumption (4.50) implies that

xViσ[·|x]|(G,x)Gp,[G,x]TVμG[·|Sj]<δ5|Vi|. (4.57)

Since |W|γ|Vi|/|Ω|δ|Vi|, (4.57) implies that there is a set WW of size |W||W|/2 such that (4.56) holds for all xW. Now, construct a sequence W=W0W1 inductively as follows. In step i1 pick some xiWi1. Then Wi contains x i and all yWi1{xi} whose distance from x i is greater than 10(+1). Since for each x i the total number of variable nodes at distance at most 10(+1) is bounded by Δ10(+1) and |W0|δ|Vi|/2δ3n/2, the set i1Wi has size at least δ3Δ10(+1)n/2>1/δ, provided that n is large enough. Finally, simply pick any subset Li1Wi of size |L|=1/δ.

Consider the event ={σ[ω|L]|L|1xLp,[G,x]+γ3}. We claim that

μG[|Sj]2δ2. (4.58)

Indeed, by (4.56) and the union bound we have

1{xL:σ[·|x]|(G,x)Gp,[G,x]TVδ}μG1xL1{σ[·|x]|(G,x)Gp,[G,x]TV>δ}μG1δ2. (4.59)
 

Now, let L be the coarsest σ‐algebra such that L(G,x) for all xL. Suppose that σSj is such that

σ[·|x]|(G,x)G(σ)p,[G,x]TVδforallxL. (4.60)

We claim that (4.60) implies

1{σ}|LG(σ)<δ3. (4.61)

Indeed, let X=xL1{σ(x)=ω}. Then (4.60) implies that

X(σ)|L(σ)2δ|L|+xLp,[G,x](ω). (4.62)

Furthermore, the pairwise distance of the variables in L is at least 2(+1) and given L the values of the variables at distance either or +1 from each xL are fixed. Therefore, given L the events {σ(x)=ω} are mutually independent. In effect, X is stochastically dominated by a sum of independent random variables. Hence, recalling that γ is much smaller than δ, we see that (4.61) follows from (4.62) and the Chernoff bound. Finally, combining (4.59) and (4.61) we obtain (4.58).

But (4.58) does not sit well with (4.55). In fact, (4.55) entails that μG[|Sj]γ2; for consider the random variable Y=xL1{σ(x)ω}. Then (4.55) yields Yμ[·|Sj]xL(1μGx[ω|Sj])|L|(1γ/|Ω|)xLp,[G,x](ω). Hence, by Markov's inequality

1μG[|Sj]Yμ[·|Sj]|L|(1γ3)xLp,[G,x](ω)|L|(1γ/|Ω|)xLp,[G,x](ω)|L|(1γ3)xLp,[G,x](ω)
1γ/|Ω|1γ31γ2.

Combining this bound with (4.58), we obtain γ2μG()/μG(Sj)2δ2/μG(Sj). Thus, choosing δ much smaller than γ, we conclude that μG(Sj)<δ3/2, which is a contradiction. Thus, we have established that (4.51).

5. CONDITIONING ON THE LOCAL STRUCTURE

5.1. A Generalised Configuration Model

The aim in this section is to prove Proposition 4.7. The obvious problem is the conditioning on the σ‐algebra T that fixes the depth‐ neighborhoods of all variable nodes and the depth‐+1 neighborhoods of all constraint nodes. Following 16, we deal with this conditioning by setting up a generalised configuration model.

Recall that T is the (finite) set of all isomorphism classes T for TTV and +1T for TTF. Let ,n>0 be integers and let M=(V,F,d,t,(ψa)aF) be a (Δ,Ω,Ψ,Θ)‐model of size n. Moreover, let GG(M) be a 100‐acyclic factor graph. Then we define an enhanced (Δ,Ω,Ψ,Θ)‐model M(G,) with type set Θ=(TV)×[Δ] as follows. The set of variable nodes is V, the set of constraint nodes is F, the degrees are given by d and the weight function associated with each constraint a is ψ a just as in M. Moreover, the type of a variable clone (x, i) is tG,(x,i)=([G,x],i). Further, the type of a constraint clone (a, j) such that (G,a,j)=(x,i) is tG,(a,j)=([G,x],i). Clearly, G(M(G,))G(M). The following lemma shows that the model M(G,) can be used to generate factor graphs whose local structure coincides with that of G.

Lemma 5.1

Assume that

0

and that

GG(M(G,))

is

2+4

‐acyclic. Then

G

viewed as a

M

‐factor graph satisfies

GG

We are going to show inductively for l[] that GlG. The case l = 0 is immediate from the construction. Thus, assume that l > 0, let (x,i)CV and let B be the set of all clones that have distance precisely l – 1 from (x, i). Since G is (2+2)‐acyclic, the pairwise distance of any two clones in B is at least 2. Moreover, by induction we know that tG,1(w,j)=tG,1(w,j) for all (w,j)B. Therefore, tG,l(x,i)=tG,l(x,i).

In order to prove Proposition 4.7 we need to enhance the model M(G,) further to accommodate an assignment that provides a value from Ω for each clone. Thus, let σ^:CVCFΩ be a map. We call σ^ valid if σ^(x,i)=σ^(x,j) for all xV,i,j[d(x)] and if for all θΘ we have

ωΩ:|{(x,i)CV:σ^(x,i)=ω,tG,(x,i)=θ}|
=|{(a,j)CF:σ^(a,j)=ω,tG,(a,j)=θ}|.

Of course, we can extend a valid σ^ to a map VΩ,xσ^(x,1). Given a valid σ^ we define a (Δ,Ω,Ψ,Θ×Ω)‐model M(G,σ^,) with variable nodes V, constraint nodes F, degrees d and weight functions (ψa)aF such that the type tG,σ^,(x,i) of a variable clone (x, i) is ([G,x],i,σ^(x,i)) and such that the type tG,σ^,(a,j) of a constraint clone (a, j) with (G,a,j)=(x,i) is ([G,x],i,σ^(a,j)). By construction, G(M(G,σ^,))G(M(G,))G(M). Let us recall the definition of the distance from (3.5). Further, for two maps σ^,σ^:CVCFΩ let dist(σ^,σ^)=|{(v,i)CVCF:σ^(v,i)σ^(v,i))}|. In Section 5.2 we are going to establish the following.

Lemma 5.2

For any ε,>0 there is n0=n0(ε,,Δ,Ω,Ψ,Θ) such that for n>n0 the following holds. If M is a (Δ,Ω,Ψ,Θ) ‐model of size n, GG(M) is 100 ‐acyclic and σ^ is valid, then with probability at least 1ε the random factor graph G(M(G,σ^,)) has the following property. There exist a valid σ^ and a 4 ‐acyclic GG(M(G,σ^,)) such that dist(σ^,σ^)+dist(G,G(M(G,σ^,)))n0.9.

To proceed consider a (G,)‐marginal sequence q. We call σ^ q‐valid if the following two conditions hold.

  • 1
    V1: For all TTV,ωΩ we have
    |{xV:[G,x]=T,σ^(x)=ω}|=qT(ω)|{xV:[G,x]=T}|.
  • 2
    V2: For all TTF,ω1,,ωdTΩ we have
    |{aF:+1[G,a]=T,j[dT]:σ^(a,j)=ωj}|
    =qT(ω1,,ωdT)|{aF:+1[G,a]=T}|.

Lemma 5.3

For any ε,>0 there is n0=n0(ε,,Δ,Ω,Ψ,Θ) such that for n>n0 the following holds. Assume that M is a (Δ,Ω,Ψ,Θ) ‐model of size n, GG(M) is 100 ‐acyclic and q is a (G,) ‐marginal sequence such that there exists a q‐valid σ^. Then with the sum ranging over all q‐valid σ^ we have

exp(nG(,q)n)σ^|G(M(G,σ^,))||G(M(G,))|exp(nG(,q)+n).

We defer the proof of Lemma 5.3 to Section 5.3.

Proof of Proposition 4.7

We claim that

|{GG(M(n)):GG}||G(M(G,))|exp(n0.91). (5.1)

To see this, apply Lemma 5.2 to the constant map σ^:(v,j)CVCFω0 for some fixed ω0Ω. Then we conclude that with probability at least 1/2 the random graph G(M(G,))=G(M(G,σ^,)) is at distance at most n0.9 from a 4‐acyclic GG(M(G,))G(M). Furthermore, by Lemma 5.1 this factor graph G, viewed as an element of G(M), satisfies GG. Finally, since the total number of factor graphs at distance at most n0.9 from G is bounded by exp(n0.91) because all degrees are bounded, we obtain (5.1).

Let δ>0 be small enough. If σΣ(G,,q,δ), then by (4.7) there exists a (G,)‐marginal sequence q such that σΣ(G,,q,0) such that qTqTTV<δ for all TT. Because T is finite and Σ(G,,q,0), the total number of such q is bounded by a polynomial in n. Moreover, due to the continuity of G,(·) we can choose δ=δ() small enough so that |G,(q)G,(q)|<ε/2 for all such q. Hence, summing over all σ^ corresponding to σΣ(G,,q,δ), we obtain from (5.1) and Lemma 5.3 that

E[Z,q(G)|GG]σ^|G(M(G,σ^,))||{GG(M(n)):GG}|exp(nG(,q)+εn).

Conversely, by Lemma 5.2 with probability at least 1/2 the graph G(M(G,σ^,)) is within distance at most n0.9 of a 4‐acyclic G, which satisfies GG by Lemma 5.1. As before, the total number of graphs at distance at most n0.9 off G is bounded by exp(n0.91). Similarly, the total number of σ^ at distance at most n0.9 off σ^ is bounded by exp(n0.91). Therefore, by Lemma 5.1

E[1{A2+1}Z,q(G)|GG]exp(2n0.98)2σ^|G(M(G,σ^,))||G(M(G,))|exp(nG(,q)εn),

as desired.

5.2. Proof of Lemma 5.2

Let Θ*={tG,σ^,(x,i):(x,i)CV} be the set of all possible types. For each τΘ* let nτ be the number of clones (x,i)CV with tG,σ^,(x,i)=τ. Throughout this section we assume that n>n0(ε,,Δ,Ω,Ψ,Θ) is sufficiently large.

Lemma 5.4

There exists

β>0

such that the following is true. For any

G,σ^

there exists

3/4<γ<7/8

such that for every

τΘ*

either

nτnγ

or

nτ>nγ+β

The number of possible types is bounded independently of n. Hence, choosing β small enough, we can ensure that there exists an integer j > 0 such that 3/4+jβ<7/8 such that [n3/4+jβ,n3/4+(j+1)β]{nτ:τT}=.

Fix β,γ as in the previous lemma. Call τ rare if nτnγ and common otherwise. Let Y be the number of variable clones that belong to cycles of length at most 10 in G(M(G,σ^,)).

Lemma 5.5

For large enough n we have

E[Y]nγlnn

Let R be the set of variable clones (v, i) of a rare type and let U be the set of all variable clones whose distance from R in G does not exceed 100. Since the maximum degree as well as the total number of types are bounded, we have |U||R|lnlnnnγlnn, provided that n is big enough. Thus, to get the desired bound on E[Y] we merely need to consider the set W of common clones that are at distance more than 100 from R.

More specifically, let (v, i) be a common clone. We are going to bound the probability that (v,i)W and that (v, i) lies on a cycle of length at most 10. To this end, we are going to explore the (random) factor graph from (v, i) via the principle of deferred decisions. Let i1=i,,il[Δ] be a sequence of l10 indices. If (v, i) lies on a cycle of length at most 10, then there exists such a sequence (i1,,il) that corresponds to this cycle. Namely, with v1=v the cycle comprises of the clones (v1,i1),,(vl,il) such that (G(M(G,σ^,)),vj,ij)=(vj+1,ij+1). In particular, vl=v1. Clearly, the total number of sequences (i1,,il) is bounded. Furthermore, given that (v l, i l) is common, the probability that vl=v0 is bounded by 2nγ. Since γ>3/4, the linearity of expectation implies that E[Y]|U|+2n1γlnnnγlnn.

Lemma 5.6

Assume that GG(M(G,σ^,)) satisfies Y(G)nγln2n. Then there is a 4 ‐acyclic GG(M(G,)) such that dist(G,G)n0.9

Let R be the set of variable clones (v, i) of a rare type and let U be the set of all variable clones whose distance from R in G does not exceed 10. Moreover, let GG(M(G,)) minimise dist(G,G) subject to the condition that (G,v,i)=(G,v,i) for all (v,i)U. Then dist(G,G)nγlnn because the total number of types is bounded. Therefore, the assumption Y(G)nγln2n implies that Y(G)nγln3n, say. In addition, because G is 100‐acyclic, none of the clones in R lies on a cycle of length at most 4 in G.

Altering only a bounded number of edges in each step, we are now going to remove the short cycles of G one by one. Let C be the set of common clones. The construction of G ensures that only common clones lie on cycles of length at most 4. Consider one such clone (v, i) and let N be the set of all variable clones that can be reached from (v, i) by traversing precisely two edges of G; thus, N contains all clones (w, j) such that w has distance two from v and all clones (v, j) that are incident to the same constraint node as (v, i). Once more by the construction of G we have NC. Furthermore, |N|Δ2.

We claim that there exists NC and a bijection ξ:NN such that the following conditions are satisfied.

  • (i)

    tG,σ^,(w,j)=tG,σ^,(ξ(w,j)) for all (w,j)N

  • (ii)

    the pairwise distance in G between any two clones in N is at least 100.

  • (iii)

    the distance in G between N{(v,i)} and N is at least 100.

  • (iv)

    the distance between R and N is at least 100.

  • (v)

    any (w,j)N is at distance at least 100 from any clone that belongs to a cycle of G of length at most 4.

Since the maximum degree of G is bounded by Δ, there are no more than nγln4n clones violate condition (iii), (iv) or (v). By comparison, there are at least nγ+β clones of any common type. Hence, the existence of ξ follows.

Now, obtain G from G as follows.

  • let G(ξ(w,j))=G(w,j) and G(w,j)=G(ξ(w,j)) for all (w,j)N.

  • let G(w,j)=G(w,j) for all (w,j)NN.

It is immediate from the construction that any clone on a cycle of length at most 4 in G also lies on such a cycle of G. Moreover, (v, i) does not lie on a cycle of length at most 4 in G. Hence, Y(G)<Y(G). In addition, all clones on cycles of length at most 4 and their neighbours are common. Hence, the construction can be repeated on G. Since Y(G)nγln3n, we ultimately obtain a 4‐acyclic G with dist(G,G)nγln4n<n0.9.

Proof of Lemma 5.2

The assertion is immediate from Lemmas 5.5 and 5.6 and Markov's inequality.

5.3. Proof of Lemma 5.3

Let V=TV and for TV let n T be the number of variable nodes x such that [G,x]=T. By Stirling's formula the number |Σ(G,,q,0)| of assignments σ:VnΩ with marginals as prescribed by q satisfies

|ln|Σ|TVnTH(qT)|ln2n. (5.2)

Further, for TV and i[dT] let CV(T,i) be the set of all clones (x,i)CV such that tG,(x,i)=(T,i). Moreover, let CF(T,i) be the set of all clones (a,j)CF such that tG,(a,j)=(T,i). Additionally, let F(T,i) be the set of all pairs (T,j) with TTF,j[dT] such that there is (a,j)CF(T,i) such that +1[G,a]=T. Of course, the total number of perfect matchings between CV(T,i) and CF(T,i) equals nT!. If we fix σΣ(G,,q,0), then any such perfect matching induces an assignment σ^:CF(T,i)Ω by mapping a clone (a,j)CF(T,i) matched to (x, i) to the value σ(x). Let BT,i be the event that in a such random matching for all (T,j)F(T,i) and all ω we have

|{(a,j)CF:+1[G,a]=T,σ^(a,j)=ω}|=qTj(ω)|{(a,j)CF:+1[G,a]=T}|

Moreover, for (T,j)F(T,i) let mT be the number of aF such that +1[G,a]=T. Then

P[BT,i]=1nT![ωΩ(qT(ω)nT(qTj(ω)mT)(T,j)F(T,i))]
×[(T,j)F(T,i)(mT(qTj(ω)mT)ωΩ)](T,j)F(T,i),ωΩ(qTj(ω)mT)!
=(nT(qT(ω)nT)ωΩ)1(T,j)F(T,i)(mT(qTj(ω)mT)ωΩ)
=exp[O(lnn)(T,j)F(T,i)mTD(qTj||qT)].

Let F=TF. Multiplying up over all (T, i), we obtain for B=BT,i

P[B]=TVi[dT]P[BT,i]=exp[O(lnn)TFj[dT]mTD(qTj||q[Tj])], (5.3)

where the constant hidden in the O(·) depends on Δ,Ω,Ψ,Θ, only.

Further, for TF let ST be the event that for every (ω1,,ωdT)ΩdT we have

|{aF:+1[G,a]=T,j[dT]:σ^(a,j)=ωj}|
=qT(ω1,,ωdT)|{aF:+1[G,a]=T}|.

Then

P[ST|B]=(mTmTqT)j[dT](mTmTqTj)1=exp[O(lnn)mTD(qT||qT1qTdT)]. (5.4)
 

Moreover,

D(qT||qT1qTdT)=H(qT)j[dT]H(qTj)=H(qT)j[dT]H(q[Tj])+j[dT](H(q[Tj])H(qTj)) (5.5)
 
 

In addition,

H(q[Tj])H(qTj)=ωΩq[Tj](ω)lnq[Tj](ω)+qTj(ω)lnqTj(ω)=ωΩ(qTj(ω)q[Tj](ω))lnq[Tj](ω)+qTj(ω)lnqTj(ω)q[Tj](ω)=D(qTj||q[Tj])+ωΩ(qTj(ω)q[Tj](ω))lnq[Tj](ω). (5.6)
 
 
 

Further, because q is a (G,)‐marginal sequence, condition MS3 guarantees that

TFmTj[dT]ωΩ(qTj(ω)q[Tj](ω))lnq[Tj](ω)=0. (5.7)

Hence, letting S=ST, we obtain from ((5.4), (5.5))

P[S|B]=exp[O(lnn)+TFmT[H(qT)+j[dT][D(qTj||q[Tj])H(q[Tj])]]]. (5.8)

Once more the constant hidden in the O(·) depends on Δ,Ω,Ψ,Θ, only. Further, given SB we have

aFψa(σ)=exp[TFmTlnψT(σ)qT]. (5.9)

Finally, the assertion follows from (5.2), (5.3), (5.8) and (5.9).

ACKNOWLEDGEMENTS

The second author thanks Dimitris Achlioptas for inspiring discussions. We also thank two anonymous reviewers for their careful reading and their invaluable comments, which led to an improved version of Corollary 2.4, among other things.

Supported by European Research Council European Union's Seventh Framework Programme (FP7/2007‐2013)/ERC Grant Agreement n. 278857–PTCC).

A preliminary version [10] of this paper, presented by the first author at RANDOM 2015 and by the second author at the RS&A 2015 conference, contained a critical technical error that affected its main results. This present version is based on similar key insights but the main results are different from the ones stated in [10].

Contributor Information

Victor Bapst, Email: bapst@math.uni-frankfurt.de.

Amin Coja‐Oghlan, Email: acoghlan@math.uni-frankfurt.de.

REFERENCES

  • 1. Achlioptas D. and Coja‐Oghlan A., Algorithmic barriers from phase transitions, In Proceedings of 49th FOCS, IEEE, Philadelphia, 2008, pp. 793–802.
  • 2. Achlioptas D. and Moore C., Random k‐SAT: Two moments suffice to cross a sharp threshold, SIAM J Comput 36 (2006), 740–762. [Google Scholar]
  • 3. Achlioptas D. and Naor A., The two possible values of the chromatic number of a random graph, Ann Math 162 (2005), 1333–1349. [Google Scholar]
  • 4. Achlioptas D., Naor A., and Peres Y., Rigorous location of phase transitions in hard optimization problems, Nature 435 (2005), 759–764. [DOI] [PubMed] [Google Scholar]
  • 5. Achlioptas D., Naor A., and Peres Y., On the maximum satisfiability of random formulas, J ACM 54 (2007). [Google Scholar]
  • 6. Achlioptas D., and Peres Y., The threshold for random k‐SAT is 2kln2O(k) , J AMS 17 (2004), 947–973. [Google Scholar]
  • 7. Aldous D., Representations for partially exchangeable arrays of random variables, J Multivariate Anal 11 (1981), 581–598. [Google Scholar]
  • 8. Aldous D. and Steele J., The objective method: probabilistic combinatorial optimization and local weak convergence, In Kesten H. editors, Probability on discrete structures, Encyclopaedia of Mathematical Sciences, Vol. 110, Springer, Berlin, 2004, pp. 1–72. [Google Scholar]
  • 9. Bandyopadhyay A. and Gamarnik D., Counting without sampling: Asymptotics of the log‐partition function for certain statistical physics models, Random Struct Algorithms 33 (2008), 452–479. [Google Scholar]
  • 10. Bapst V. and Coja‐Oghlan A., Harnessing the Bethe free energy, In Proceedings of 19th RANDOM, Leibniz International Proceedings in Informatics, Princeton, 2015, pp. 467–480, Also available as arXiv:1504.03975, version 1.
  • 11. Bapst V. and Coja‐Oghlan A., The condensation phase transition in the regular k‐SAT model, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, Paris, 2016, pp. 22:1–22:18.
  • 12. Bapst V., Coja‐Oghlan A., and Hetterich S., Rassmann F., and D. Vilenchik The condensation phase transition in random graph coloroing, Comm Math Phys 341 (2016), 543–606. [Google Scholar]
  • 13. Bapst V., Coja‐Oghlan A., and Rassmann F. A positive temperature phase transition in random hypergraph 2‐coloring, Ann Appl Probab 26 (2016), 1362–1406. [Google Scholar]
  • 14. Barak B., Rao A., Shaltiel R., and Wigderson A., 2‐source dispersers for sub‐polynomial entropy and Ramsey graphs beating the Frankl‐Wilson construction, In Proceedings of 38th STOC, ACM, Seattle, 2006, pp. 671–680.
  • 15. Bayati M., Gamarnik D. and Tetali P., Combinatorial approach to the interpolation method and scaling limits in sparse random graphs, Ann Probab 41 (2013), 4080–4115. [Google Scholar]
  • 16. Bordenave C. and Caputo P., Large deviations of empirical neighborhood distribution in sparse random graphs, Probab Theory Relat Fields 163 (2015), 149–222. [Google Scholar]
  • 17. Coja‐Oghlan A. and Panagiotou K., The asymptotic k‐SAT threshold, Adv Math 288 (2016), 985–1068. [Google Scholar]
  • 18. Coja‐Oghlan A., Perkins W., and Skubch K., Limits of discrete distributions and Gibbs measures on random graphs, preprint, arXiv:1512.06798, 2015.
  • 19. Coja‐Oghlan A. and Perkins W., Belief Propagation on replica symmetric random factor graph models, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, 2016, pp. 27:1–27:15.
  • 20. Coja‐Oghlan A. and Zdeborová L., The condensation transition in random hypergraph 2‐coloring, In Proceedings of 23rd SODA, ACM‐SIAM, Kyoto, 2012, pp. 241–250.
  • 21. Contucci P., Dommers S., Giardina C., and Starr S., Antiferromagnetic Potts model on the Erdös‐Rényi random graph, Commun Math Phys 323 (2013), 517–554. [Google Scholar]
  • 22. Dembo A. and Montanari A., Ising models on locally tree‐like graphs, Ann Appl Probab 20 (2010), 565–592. [Google Scholar]
  • 23. Dembo A., Montanari A., and Sun N., Factor models on locally tree‐like graphs, Ann Probab 41 (2013), 4162–4213. [Google Scholar]
  • 24. Dembo A., Montanari A., Sly A. and Sun N., The replica symmetric solution for Potts models on d‐regular graphs, Comm Math Phys 327 (2014), 551–575. [Google Scholar]
  • 25. Ding J., Sly A., and Sun N., Satisfiability threshold for random regular NAE‐SAT, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 814–822.
  • 26. Ding J., Sly A., and Sun N., Proof of the satisfiability conjecture for large k , In Proceedings of 47th STOC, ACM, Portland, 2015, pp. 59–68.
  • 27. Durrett R., Probability: Theory and examples, 4th edition, Cambridge University Press, Cambridge, 2010. [Google Scholar]
  • 28. Erdös P., Some remarks on the theory of graphs, Bull Am Math Soc 53 (1947), 292–294. [Google Scholar]
  • 29. Erdös P. Graph theory and probability, Canad J Math 11 (1959), 34–38. [Google Scholar]
  • 30. Franz S. and Leone M., Replica bounds for optimization problems and diluted spin systems, J Stat Phys 111 (2003), 535–564. [Google Scholar]
  • 31. Galanis A., Stefankovic D., and Vigoda E., Inapproximability for antiferromagnetic spin systems in the tree non‐uniqueness region, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 823–831.
  • 32. Guerra F., Broken replica symmetry bounds in the mean field spin glass model, Comm Math Phys 233 (2003), 1–12. [Google Scholar]
  • 33. Hoover D., Relations on probability spaces and arrays of random variables, Preprint, Institute of Advanced Studies, Princeton, 1979.
  • 34. Krzakala F., Montanari A., Ricci‐Tersenghi F., Semerjian G. and Zdeborova L., Gibbs states and the set of solutions of random constraint satisfaction problems, Proc Nat Acad Sci 104 (2007), 10318–10323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lováasz L., Large networks and graph limits, Vol. 60, Colloquium Publications, AMS, Providence, 2012. [Google Scholar]
  • 36. Mézard M. and Montanari A., Information, physics and computation, Oxford University Press, Oxford, 2009. [Google Scholar]
  • 37. Mézard M. and Parisi G., and Zecchina R., Analytic and algorithmic solution of random satisfiability problems, Science 297 (2002), 812–815. [DOI] [PubMed] [Google Scholar]
  • 38. Montanari A. and Shah D., Counting good truth assignments of random k‐SAT formulae, In Proceedings of 18th SODA, ACM‐SIAM, New Orleans, 2007, pp. 1255–1264.
  • 39. Mossel E., Weitz D., and Wormald N., On the hardness of sampling independent sets beyond the tree threshold, Probab Theory Relat Fields 143 (2009), 401–439. [Google Scholar]
  • 40. Mossel E., Neeman J. and Sly A., Reconstruction and estimation in the planted partition model, Probab Theory Relat Fields 162 (2014), 1–31. [Google Scholar]
  • 41. Nešetřil J., A combinatorial classic–sparse graphs with high chromatic number. In Lovász L. et al. editors, Erdös Centennial, Springer, Berlin, 2013. [Google Scholar]
  • 42. Panchenko D. and Talagrand M., Bounds for diluted mean‐fields spin glass models, Probab Theory Relat Fields 130 (2004), 319–336. [Google Scholar]
  • 43. Paley R. and Zygmund A., On some series of functions, (3), Proc Camb Philos Soc 28 (1932), 190–205. [Google Scholar]
  • 44. Panchenko D., Spin glass models from the point of view of spin distributions, Ann Probab 41 (2013), 1315–1361. [Google Scholar]
  • 45. Panchenko D., The Sherrington‐Kirkpatrick model, Springer, Berlin, 2013. [Google Scholar]
  • 46. Richardson T. and Urbanke R., Modern coding theory, Cambridge University Press, Cambridge, 2008. [Google Scholar]
  • 47. Robinson R. and Wormald N., Almost all regular graphs are Hamiltonian, Random Struct Algorithms 5 (1994), 363–374. [Google Scholar]
  • 48. Sly A. and Sun N., The computational hardness of counting in two‐spin models on d‐regular graphs, In Proceedings of 53rd FOCS, IEEE, New Brunswick, 2012, pp. 361–369.
  • 49. Szemerédi E., Regular partitions of graphs, Colloq Inter CNRS 260 (1978), 399–401. [Google Scholar]
  • 50. Tao T., Szemerédi's regularity lemma revisited, Contrib Discrete Math 1 (2006), 8–28. [Google Scholar]
  • 51. Turán P., On a theorem of Hardy and Ramanujan, J London Math Soc 9 (1934), 274–276. [Google Scholar]

Articles from Random Structures & Algorithms are provided here courtesy of Wiley

RESOURCES