Harnessing the Bethe free energy

Victor Bapst; Amin Coja‐Oghlan

doi:10.1002/rsa.20692

. 2016 Oct 21;49(4):694–741. doi: 10.1002/rsa.20692

Harnessing the Bethe free energy^{^†}

Victor Bapst ¹, Amin Coja‐Oghlan ^1,^✉

PMCID: PMC5153882 PMID: 28035178

ABSTRACT

A wide class of problems in combinatorics, computer science and physics can be described along the following lines. There are a large number of variables ranging over a finite domain that interact through constraints that each bind a few variables and either encourage or discourage certain value combinations. Examples include the k‐SAT problem or the Ising model. Such models naturally induce a Gibbs measure on the set of assignments, which is characterised by its partition function. The present paper deals with the partition function of problems where the interactions between variables and constraints are induced by a sparse random (hyper)graph. According to physics predictions, a generic recipe called the “replica symmetric cavity method” yields the correct value of the partition function if the underlying model enjoys certain properties [Krzkala et al., PNAS (2007) 10318–10323]. Guided by this conjecture, we prove general sufficient conditions for the success of the cavity method. The proofs are based on a “regularity lemma” for probability measures on sets of the form $Ω^{n}$ for a finite Ω and a large n that may be of independent interest. © 2016 Wiley Periodicals, Inc. Random Struct. Alg., 49, 694–741, 2016

Keywords: random graphs, Belief Propagation, cavity method, regularity lemma

1. INTRODUCTION

Despite their simplicity, or perhaps because thereof, the first and the second moment method are the most widely used techniques in probabilistic combinatorics. Erdős employed the first moment method famously to lower‐bound the Ramsey number as well as to establish the existence of graphs of high girth and high chromatic number 28, 29. Even a half‐century on, deterministic constructions cannot hold a candle to these probabilistic results 14, 41. Moreover, the second moment method has been used to count prime factors 51 and Hamilton cycles 47 as well as to determine the two possible values of the chromatic number of a sparse random graph 3.

Yet there are quite a few problems for which the standard first and the second moment methods are too simplistic. The random k‐SAT model is a case in point. There are n Boolean variables $x_{1}, \dots, x_{n}$ and m clauses $a_{1}, \dots, a_{m}$ , where $m = ⌈ α n ⌉$ (the real αn rounded up to the next integer) for some fixed $α > 0$ . Each clause binds k variables, which are chosen independently and uniformly, and discourages them from taking precisely one of the $2^{k}$ possible truth value combinations. The forbidden combination is chosen uniformly and independently for each clause.

The random k‐SAT instance $Φ = Φ_{k} (n, m)$ gives rise to a probability measure on the set ${0, 1}^{n}$ of all Boolean assignments naturally. Indeed, for a given parameter $β \geq 0$ the Gibbs measure $μ_{Φ, β}$ is defined by letting

μ_{Φ, β} (σ) = \frac{1}{Z_{β} (Φ)} \prod_{i = 1}^{m} \exp (- β 1 {σ violates a_{i}})

(1.1)

for every assignment $σ \in {0, 1}^{n}$ , where

Z_{β} (Φ) = \sum_{σ \in {0, 1}^{n}} \prod_{i = 1}^{m} \exp (- β 1 {σ violates a_{i}})

(1.2)

is called the partition function. Thus, the Gibbs measure weighs assignments according to the number of clauses that they violate. In effect, by tuning β we can interpolate between just the uniform distribution on ${0, 1}^{n}$ (β = 0) and a measure that strongly favours satisfying assignments ( $β \to \infty$ ). Hence, if we think of $Φ$ as inducing a “height function” $σ \mapsto # {clauses of$ $Φ$ violated by $σ}$ on the set of assignments, then varying β allows us to explore the resulting landscape. Apart from its intrinsic combinatorial interest, the shape of the height function, the so‐called “Hamiltonian”, governs the performance of algorithms such as the Metropolis process or Simulated Annealing.

To understand the Gibbs measure it is key to get a handle on the partition function $Z_{β} (Φ)$ . Of course, the default approach to this kind of problem would be to apply the first and second moment methods. However, upon closer inspection it emerges that $Z_{β} (Φ) < \exp (- Ω (n)) E [Z_{β} (Φ)]$ with high probability for any $α, β > 0$ 5. In other words, the first moment over‐estimates the partition function of a typical random formula by an exponential factor. The reason for this is a “lottery effect”: a tiny minority of formulas render an exceptionally high contribution to $E [Z_{β} (Φ)]$ . Unsurprisingly, going to the second moment only exacerbates the problem and thus for any $α, β > 0$ we find $E [Z_{β} {(Φ)}^{2}] \geq \exp (Ω (n)) E {[Z_{β} (Φ)]}^{2}$ . In other words, the second moment method fails rather spectacularly for all possible parameter combinations.

The first and the second moment method fall victim to similar large deviations effects in many alike “random constraint satisfaction problems”. These problems, ubiquitous in combinatorics, information theory, computer science and physics 4, 36, 46, can be described along the following lines. A random factor graph, chosen either from a uniform distribution (like the random k‐SAT model above) or from a suitable configuration model, induces interactions between the variables and the constraints. The variables range over a fixed finite domain Ω and each constraint binds a few variables. The constraints come with “weight functions” that either encourage or discourage certain value combinations of the incident variables. Multiplying up the weight functions of all the contraints just like in ((1.1), (1.2)), we obtain the Gibbs measure and the partition function.

With the standard first and second moment method drawing a blank, we seem to be at a loss as far as calculating the partition function is concerned. However, physicists have put forward an ingenious albeit non‐rigorous alternative called the cavity method 36. This technique, which applies almost mechanically to any problem that can be described in the language of sparse random factor graphs, yields an explicit conjecture as to the value of the partition function. More specifically, the cavity method comes in several installments. In this paper, we are concerned with the simplest, so‐called “replica symmetric” version.

In one of their key papers 34 physicists hypothesized abstract conditions under which the replica symmetric cavity method yields the correct value of the partition function. The thrust of this paper is to prove corresponding rigorous results. Specifically, according to 34 the replica symmetric cavity method gives the correct answer if the Gibbs measure satisfies certain correlation decay properties. For example, the Gibbs uniqueness condition requires that under the Gibbs measure the value assigned to a variable x is asymptotically independent of the values assigned to the variables at a large distance from x in the factor graph. In Corollary 4.6 below we prove that this condition is indeed sufficient to guarantee the success of the cavity method. Additionally, Theorems 4.4 and 4.5 yield rigorous sufficient conditions in terms of substantially weaker conditions, namely a symmetry property and the non‐reconstruction property.

A key feature of the paper is that we establish these results not for specific examples but generically for a very wide class of factor graph models. Of course, stating and proving general results requires a degree of abstraction. In particular, we resort to the framework of local weak convergence of graph sequences 8, 35. This framework suits the physics predictions well, which come in terms of the “limiting tree” that describes the local structure of a large random factor graph. To be precise, the replica symmetric prediction is given by a functional called the Bethe free energy applied to an (infinite) random tree.

The principal tool to prove these results is a theorem about the structure of probability measures on sets of the form $Ω^{n}$ for some fixed finite set Ω and a large integer n, Theorem 2.1 below. We expect that this result, which is inspired by Szemerédi's regularity lemma 49, will be of independent interest. To prove our results about random factor graphs, we combine Theorem 2.1 with the theory of local weak convergence to carry out completely generically “smart” first and second moment arguments that avoid the lottery effects that the standard arguments fall victim to.

In Section 2 we begin with the abstract results about probability measures on cubes. Subsequently, in Section 3 we set the stage by introducing the formalism of factor graphs and local weak convergence. Further, in Section 4 we state and prove the main results about Gibbs measures on random factor graphs. Finally, Section 5 contains the proof of a technical result that enables us to control the local structure of random factor graphs.

1.1. Related Work

A detailed (non‐rigorous) discussion of the cavity method can be found in 36. It is known that the replica symmetric version of the cavity method does not always yield the correct value of the partition function. For instance, in some factor graph models there occurs a “condensation phase transition” beyond which the replica symmetric prediction is off 20, 34. The more complex “1‐step replica symmetry breaking (1RSB)” version of the cavity method 37 is expected to yield the correct value of the partition function some way beyond condensation. However, another phase transition called “full replica symmetry breaking” spells doom on even the 1RSB cavity method 36.

The replica symmetric cavity method has been vindicated rigorously in various special cases. For instance, Montanari and Shah 38 proved that in the random k‐SAT model the replica symmetric prediction is correct up to the Gibbs uniqueness threshold. A similar result was obtained by Bandyopadhyay and Gamarnik 9 for graph colorings and independent sets. Furthermore, Dembo, Montanari and Sun 23 proved the replica symmetric conjecture on a class of models with specific types of constraints. A strength of 23 is that the result applies even to sequences of non‐random factor graphs under a local weak convergence assumption. But both 23, 38 are based on the “interpolation method” 30, 32, 42, which entails substantial restrictions on the types of models that can be handled. By contrast, the present proof method is based on a completely different approach centered around the abstract classification of measures on cubes that we present in Section 2.

Since the “vanilla” second moment method fails on the random k‐SAT model, more sophisticated variants have been proposed. The basic idea is to apply the second moment method not to the partition function itself but to a tweaked random variable. For instance, Achlioptas and Moore 2 applied the second moment method to NAE‐satisfying assignments, i.e., both the assignment and its binary inverse satisfy all clauses. However, the number of NAE‐satisfying assignments is exponentially smaller than the total number of satisfying assignments and thus this type of argument cannot yield the typical value of the partition function. The same is true of the more subtle random variable of Achlioptas and Peres 6. Furthermore, the work of Ding, Sly and Sun 26 that yields the precise k‐SAT threshold for large k is based on applying the second moment method to a random variable whose construction is guided by the 1RSB cavity method. Among other things, the random variable from 26 incorporates conditioning on the local structure of the factor graph, an idea that will be fundamental to our arguments as well.

The material of Section 2 of the present paper has recently been investigated from the more analytic viewpoint of the theory of graph limits 18. This leads to a general notion of limits of probability measures on discrete cubes. The article 18 also discusses the connection with the Aldous‐Hoover representation of exchangeable arrays, which has long been known to be related to the theory of graph limits, and Panchenko's notion of asymptotic Gibbs measures 7, 33, 44, 45. A further recent application of the methods of the present paper to two special classes of random factor graph models can be found in 19.

1.2. Notation

If $X$ is a finite set, then we denote by $P (X)$ the set of probability measures on $X$ . Moreover, ${‖ \cdot ‖}_{TV}$ signifies the total variation norm. If μ is a probability measure on a product space $X^{V}$ for finite sets $X$ , V and $U \subset V$ , then $μ_{↓ U} \in P (X^{U})$ denotes the marginal distribution of μ on U. That is, if ${(x_{u})}_{u \in U} \in X^{U}$ , then

μ_{↓ U} ({(x_{u})}_{u \in U}) = \sum_{{(x_{u})}_{u \in V ∖ U} \in X^{V ∖ U}} μ ({(x_{v})}_{v \in V}) .

If $U = {u}$ for some $u \in V$ , then we briefly write $μ_{↓ u}$ rather than $μ_{↓ {u}}$ . Further, if $S \subset X^{V}$ is an event with $μ (S) > 0$ , than $μ [\cdot | S]$ is the conditional distribution given S. That is, for any event $Q \subset X^{V}$ we have $μ [Q | S] = μ [Q \cap S] / μ [S]$ .

The entropy of a probability measure $μ \in P (X)$ is denoted by $H (μ)$ . Thus, with the convention that $0 \ln 0 = 0$ we have $H (μ) = - \sum_{x \in X} μ (x) \ln μ (x)$ . Further, agreeing that $0 \ln \frac{0}{0} = 0$ as well, we recall that the Kullback‐Leibler divergence of $μ, ν \in P (X)$ is

D (ν | | μ) = \sum_{x \in X} ν (x) \ln \frac{ν (x)}{μ (x)} \in [0, \infty] .

We are going to work with probability measures on sets $Ω^{n}$ for a (small) finite Ω and a large integer n a lot. If $μ \in P (Ω^{n})$ , then we write $σ_{μ}, τ_{μ}$ for two independent samples from μ. Where μ is obvious from the context we just write $σ, τ$ . Additionally, if $X (σ)$ is a random variable, then ${〈 X (σ) 〉}_{μ} = \sum_{σ \in Ω^{n}} μ (σ) X (σ)$ stands for the expectation of X with respect to μ. Further, if $σ \in Ω^{n}, \emptyset \neq U \subset [n]$ and $ω \in Ω$ , then we let

σ [ω | U] = | σ^{- 1} (ω) \cap U | / | U | .

Thus, $σ [\cdot | U]$ is a probability distribution on Ω, namely the distribution of $σ (x)$ for a random $x \in U$ . If $U = {x}$ for some $x \in [n]$ , then we just write $σ [ω | x]$ rather than $σ [ω | {x}]$ . Clearly, $σ [ω | x] = 1 {σ (x) = ω}$ .

We use the ${〈 \cdot 〉}_{μ}$ notation for averages over $μ \in P (Ω^{n})$ to avoid confusion with averages over other, additional random quantities, for which we reserve the common symbols $E [\cdot], P [\cdot]$ . Furthermore, we frequently work with conditional expectations. Hence, let us recall that for a probability space $(X, A, P)$ , a random variable $X : X \to ℝ$ and a σ‐algebra $F \subset A$ the conditional expectation $E [X | F]$ is a $F$ ‐measurable random variable on $X \to ℝ$ such that for every $F$ ‐measurable event F we have $E [1 {F} E [X | F]] = E [1 {F} X]$ . Moreover, recall that the conditional variance is defined as $Var [X | F] = E [X^{2} | F] - E {[X | F]}^{2}$ .

In line with the two previous paragraphs, if $Y : Ω^{n} \to ℝ$ is a random variable, $μ \in P (Ω^{n})$ and $F$ is a σ‐algebra on $Ω^{n}$ , then we write ${〈 Y | F 〉}_{μ}$ for the conditional expectation, which is a $F$ ‐measurable random variable $σ \in Ω^{n} \mapsto {〈 Y | F 〉}_{μ} (σ)$ . Accordingly, for an event $A \subset Ω^{n}$ with $μ (A) > 0$ we write ${〈 Y | A 〉}_{μ} = {〈 Y 1 {A} 〉}_{μ} / μ (A) \in ℝ$ for the expectation of Y given A.

Finally, we need the Paley‐Zygmund inequality 43, Lemma 19]: if $X \geq 0$ is a random variable with a finite second moment, then

P [X \geq t E [X]] \geq {(1 - t)}^{2} \frac{E {[X]}^{2}}{E [X^{2}]} for any 0 < t < 1.

(1.3)

2. PROBABILITY MEASURES ON THE CUBE

In this section we present a general “regularity lemma” for probability measures on sets $Ω^{n}$ for some finite set Ω and a large integer n (Theorem 2.1 below).

2.1. Examples

Needless to say, probability distributions on sets $Ω^{n}$ for a small finite Ω and a large integer n are ubiquitous. To get an idea of what we might hope to prove about them in general, let us look at a few examples.

The simplest case certainly is a product measure $μ = p^{\otimes n}$ with $p \in P (Ω)$ . By the Chernoff bound, for any fixed $ε > 0$ there is $n_{0} = n_{0} (ε, Ω) > 0$ such that for $n > n_{0}$ we have

{〈 {‖ σ [\cdot | U] - p ‖}_{TV} 〉}_{μ} < ε for every U \subset [n] such that | U | \geq ε n .

(2.1)

In words, if we fix a large enough set U of coordinates and then choose $σ$ randomly, then with probability close to one the empirical distribution on U will be close to p.

As a twist on the previous example, let $p \in P (Ω)$ , assume that n is a square and define a measure μ by letting

μ (ω_{1}, \dots, ω_{n}) = \prod_{i = 0}^{\sqrt{n} - 1} [p (ω_{1 + i \sqrt{n}}) 1 {\forall j \in [\sqrt{n}] : ω_{j + i \sqrt{n}} = ω_{1 + i \sqrt{n}}}] .

In words, the coordinates come in blocks of size $\sqrt{n}$ . While the values of all the coordinates in one block coincide and have distribution p, the coordinates in different blocks are independent. Although μ is not a product distribution, (2.1) is satisfied for any fixed $ε > 0$ and large enough n. Furthermore, if for a fixed k > 1 we choose $x_{1}, \dots, x_{k} \in [n]$ uniformly and independently, then

E {‖ μ_{↓ {x_{1}, \dots, x_{k}}} - μ_{↓ x_{1}} \otimes \dots \otimes μ_{↓ x_{k}} ‖}_{TV} < ε,

(2.2)

provided that $n > n_{1} (ε, k, Ω)$ is sufficiently large. This is because for large enough n it is unlikely that two of the randomly chosen $x_{1}, \dots, x_{k}$ belong to the same block.

As a third example, consider the set $Ω = {0, 1}$ and the measure μ defined by

μ^{(0)} (ω_{1}, \dots, ω_{n}) = {(\frac{1}{3})}^{\sum_{i = 1}^{n} ω_{i}} {(\frac{2}{3})}^{n - \sum_{i = 1}^{n} ω_{i}},

μ^{(1)} (ω_{1}, \dots, ω_{n}) = {(\frac{2}{3})}^{\sum_{i = 1}^{n} ω_{i}} {(\frac{1}{2})}^{n - \sum_{i = 1}^{n} ω_{i}}, μ = \frac{1}{2} (μ^{(0)} + μ^{(1)}) .

All the marginals $μ_{↓ i}, i \in [n]$ , are equal to the uniform distribution on {0, 1}. But of course the uniform distribution on $Ω^{n}$ is a horrible approximation to μ. Indeed, by the Chernoff bound with overwhelming probability a point $(ω_{1}, \dots, ω_{n})$ drawn from μ either satisfies $\frac{1}{n} \sum_{i = 1}^{n} ω_{i} \sim 1 / 3$ or $\frac{1}{n} \sum_{i = 1}^{n} ω_{i} \sim 2 / 3$ . However, the conditional distribution given, say, $\frac{1}{n} \sum_{i = 1}^{n} ω_{i} \leq 1 / 2$ , is close to a product measure. Thus, μ induces a decomposition of $Ω^{n}$ into two “states” $S_{0} = {\frac{1}{n} \sum_{i = 1}^{n} ω_{i} \leq 1 / 2}, S_{1} = {\frac{1}{n} \sum_{i = 1}^{n} ω_{i} > 1 / 2}$ such that $μ [\cdot | S_{0}], μ [\cdot | S_{1}]$ are close to product measures.

As a final example, consider $Ω = {0, 1}$ , assume that n is even and define $μ \in P (Ω^{n})$ by letting

μ (ω_{1}, \dots, ω_{n}) = {(\frac{1}{2})}^{n / 2} {(\frac{1}{3})}^{\sum_{i > n / 2} ω_{i}} {(\frac{2}{3})}^{n / 2 - \sum_{i > n / 2} ω_{i}} .

In words, μ is a product measure with marginal distribution $Be (1 / 2)$ on the first $n / 2$ coordinates and $Be (1 / 3)$ on the other coordinates. Clearly, μ satisfies (2.1) with $p = Be (1 / 2)$ for sets $U \subset [n / 2]$ and with $p = Be (1 / 3)$ for sets $U \subset [n] ∖ [n / 2]$ , provided that n is large.

In summary, the following picture emerges. The conditions (2.1) and (2.2) are proxies for saying that a given measure μ resembles a product measure. Furthermore, in order to obtain from a given μ measures that satisfy (2.1) or (2.2) it may be necessary to decompose the space $Ω^{n}$ into “states” so that the conditional distributions have these properties. In addition, because different coordinates may have different marginal distributions, for (2.1) to hold it may be necessary to partition the set $[n]$ of coordinates.

2.2. Homogeneity

The main result of this section shows that by partitioning the space $Ω^{n}$ and/or the set $[n]$ of coordinates it is always possible to “approximate” a given measure μ by measures that satisfy (2.1) for some suitable p as well as (2.2). In fact, the number of parts that we have to partition $[n]$ and $Ω^{n}$ into is bounded only in terms of the desired accuracy but independently of n.

Let us introduce some terminology. If $V = (V_{1}, \dots, V_{k})$ is a partition of some set V, then we call $# V = k$ the size of $V$ . Furthermore, a partition $W = (W_{1}, \dots, W_{l})$ refines another partition $V = (V_{1}, \dots, V_{k})$ if for each $i \in [l]$ there is $j \in [k]$ such that $W_{i} \subset V_{j}$ .

For $ε > 0$ we say that $μ \in P (Ω^{n})$ is $ε$ ‐regular on a set $U \subset [n]$ if for every subset $W \subset U$ of size $| W | \geq ε | U |$ we have

{〈 {‖ σ [\cdot | W] - σ [\cdot | U] ‖}_{TV} 〉}_{μ} < ε .

Further, μ is $ε$ ‐regular with respect to a partition $V$ if there is a set $J \subset [# V]$ such that $\sum_{i \in [# V] ∖ J} | V_{i} | < ε n$ and such that μ is $ε$ ‐regular on V _i for all $i \in J$ . Additionally, if $V$ is a partition of $[n]$ and $S$ is a partition of $Ω^{n}$ , then we say that μ is $ε$ ‐homogeneous with respect to $(V, S)$ if there is a subset $I \subset [# S]$ such that the following is true.

1
HM1: We have $μ (S_{i}) > 0$ for all $i \in I$ and $\sum_{i \in [# S] ∖ I} μ (S_{i}) < ε$ .
2
HM2: for all $i \in [# S]$ and $j \in [# V]$ we have $\max_{σ, σ' \in S_{i}} {‖ σ [\cdot | V_{j}] - σ' [\cdot | V_{j}] ‖}_{TV} < ε .$
3
HM3: for all $i \in I$ the measure $μ [\cdot | S_{i}]$ is $ε$ ‐regular with respect to $V$ .
4
HM4: μ is $ε$ ‐regular with respect to $V$ .

Theorem 2.1

For any

$ε > 0$

there exists

$N = N (ε, Ω) > 0$

such that for every n > N, any measure

$μ \in P (Ω^{n})$

and any partition

$V_{0}$

of

$[n]$

of size

$# V_{0} \leq 1 / ε$

the following is true. There exist a refinement

$V$

of

$V_{0}$

and a partition

$S$

of

$Ω^{n}$

such that

$# V + # S \leq N$

and such that μ is

$ε$

‐homogeneous with respect to

$(V, S)$

Informally speaking, Theorem 2.1 shows that any probability measure $μ \in P (Ω^{n})$ admits a partition $(V, S)$ such that the following is true. Almost the entire probability mass of μ belongs to parts S _i such that the conditional measure $μ [\cdot | S_{i}]$ is $ε$ ‐regular w.r.t. $V$ . This means that almost every coordinate $x \in [n]$ belongs to a class V _j such that for every “large” $U \subset V_{j}$ for $σ$ chosen from $μ [\cdot | S_{i}]$ very likely the empirical distribution $σ [\cdot | U]$ is close to the marginal distribution ${〈 σ [\cdot | V_{j}] 〉}_{μ [\cdot S_{i}]}$ of the entire class.

Theorem 2.1 and its proof, which we defer to Section 2.3, are inspired by Szemerédi's regularity lemma 49. Let us proceed to state a few consequences of Theorem 2.1.

A $(ε, k)$ ‐state of μ is a set $S \subset Ω^{n}$ such that $μ (S) > 0$ and

\frac{1}{n^{k}} \sum_{x_{1}, \dots, x_{k} \in [n]} {‖ μ_{↓ {x_{1}, \dots, x_{k}}} [\cdot | S] - μ_{↓ x_{1}} [\cdot | S] \otimes \dots \otimes μ_{↓ x_{k}} [\cdot | S] ‖}_{TV} < ε .

In other words, if we choose $x_{1}, \dots, x_{k} \in [n]$ independently and uniformly at random, then the expected total variation distance between the joint distribution $μ_{↓ {{\vec{x}}_{1}, \dots, {\vec{x}}_{k}}} [\cdot | S]$ of $x_{1}, \dots, x_{k}$ and the product $μ_{↓ {\vec{x}}_{1}} [\cdot | S] \otimes \dots \otimes μ_{↓ {\vec{x}}_{k}} [\cdot | S]$ of the marginal distributions is small.

Corollary 2.2

For any

$ε > 0, k \geq 2$

there exists

$η = η (ε, k, Ω) > 0$

such that for every

$n > 1 / η$

any measure

$μ \in P (Ω^{n})$

has pairwise disjoint

$(ε, k)$

‐states

$S_{1}, \dots, S_{N}$

such that

$μ (S_{i}) \geq η$

for all

$i \in [N]$

and

$\sum_{i = 1}^{N} μ (S_{i}) \geq 1 - ε$

Thus, we can chop the space $Ω^{n}$ into subsets $S_{1}, \dots, S_{N}, N \leq 1 / η$ , that capture almost the entire probability mass such that $μ [\cdot | S_{i}]$ “resembles a product measure” for each $i \in [N]$ . We prove Corollary 2.2 in Section 2.4.

Let us call μ $(ε, k)$ ‐symmetric if $S = Ω^{n}$ itself is an $(ε, k)$ ‐state.

Corollary 2.3

For any $ε, k$ there exists δ such that for any $ξ > 0$ there is $η > 0$ such that for all $n > 1 / η$ and all $μ \in P (Ω^{n})$ the following is true. If for any two $(ξ, 2)$ ‐states S₁, S₂ with $μ (S_{1}), μ (S_{2}) \geq η$ we have

$\frac{1}{n} \sum_{x \in [n]} {‖ μ_{↓ x} [\cdot | S_{1}] - μ_{↓ x} [\cdot | S_{2}] ‖}_{TV} < δ,$ (2.3)

then μ is $(ε, k)$ ‐symmetric.

Thus, the entire measure μ “resembles a product measure” if extensive states have similar marginal distributions. Conversely, we have the following.

Corollary 2.4

For any

$ε > 0, η > 0$

there exists

$δ > 0$

such that for all

$n > 1 / δ$

and all

$μ \in P (Ω^{n})$

the following is true. If μ is

$(δ, 2)$

‐symmetric, then for any S with

$μ (S) \geq η$

we have

$\frac{1}{n} \sum_{x \in [n]} {‖ μ_{↓ x} [\cdot | S] - μ_{↓ x} ‖}_{TV} < ε .$

The proofs of Corollaries 2.3 and 2.4 can be found in Sections 2.5 and 2.6, respectively. Finally, in Section 2.7 we prove the following fact that will be useful in Section 4.

Proposition 2.5

For any $ε > 0$ there exist $δ > 0$ such that for large enough n the following is true. If $μ \in P (Ω^{n})$ is $(δ, 2)$ ‐symmetric, then $μ \otimes μ \in P (Ω^{n} \times Ω^{n})$ is $(ε, 2)$ ‐symmetric.

2.3. Proof of Theorem 2.1

Throughout this section we assume that n is sufficiently large. To prove Theorem 2.1 and guided by 49, we define the index of μ with respect to a partition $V$ of $[n]$ as

{ind}_{μ} (V) = \frac{1}{| Ω | n} \sum_{ω \in Ω} \sum_{j \in [# \vec{V}]} \sum_{x \in V_{j}} {〈 {(σ [ω | x] - σ [ω | V_{j}])}^{2} 〉}_{μ} .

The index can be viewed as a conditional variance (cf. 50). Indeed, choose $x \in [n]$ uniformly and independently of $σ$ . Furthermore, let $F_{V}$ be the σ‐algebra generated by the events ${x \in V_{i}}$ for $i \in [# V]$ . Writing $E [\cdot]$ and $Var [\cdot]$ for the expectation and variance with respect to the choice of $x$ only, we see that

{ind}_{μ} (V) = \frac{1}{| Ω |} \sum_{ω \in Ω} E {〈 Var [σ [ω | \vec{x}] | F_{V}] 〉}_{μ} .

Lemma 2.6

For any partition $V$ of $[n]$ we have ${ind}_{μ} (V) \in [0, 1]$ . If $W$ is a refinement of $V$ , then ${ind}_{μ} (W) \leq {ind}_{μ} (V)$

The fact that ${ind}_{μ} (V) \in [0, 1]$ is immediate from the definition. Moreover, if $W$ refines $V$ , then $F_{V} \subset F_{W}$ . Consequently, $E {〈 Var [σ [ω | x] | F_{W}] 〉}_{μ} \leq E {〈 Var [σ [ω | x] | F_{V}] 〉}_{μ}$ . Averaging over $ω \in Ω$ yields ${ind}_{μ} (W) \leq {ind}_{μ} (V)$ .

Lemma 2.7

If $μ \in P (Ω^{n})$ fails to be $ε$ ‐regular with respect to $V$ , then there is a refinement $W$ of $V$ such that $# W \leq 2 # V$ and ${ind}_{μ} (W) \leq {ind}_{μ} (V) - ε^{4} / (4 | Ω |^{3}) .$

Let $\bar{J}$ be the set of all indices $j \in [# V]$ such that there exists $U \subset V_{j}$ of size $| U | \geq ε | V_{j} |$ such that

${〈 {‖ σ [\cdot | U] - σ [\cdot | V_{j}] ‖}_{TV} 〉}_{μ} \geq ε .$ (2.4)

Since μ fails to be $ε$ ‐regular with respect to $V$ we have

$\sum_{j \in \bar{J}} | V_{j} | \geq ε n .$ (2.5)

For each $j \in \bar{J}$ pick a set $U_{j} \subset V_{j}, | U_{j} | \geq ε | V_{j} |$ such that (2.4) is satisfied. Then there exists $ω_{j} \in Ω$ such that

${〈 | σ [ω_{j} | U_{j}] - σ [ω_{j} | V_{j}] | 〉}_{μ} \geq ε / (2 | Ω |) .$ (2.6)

Let $W$ be the partition obtained from $V$ by splitting each class V _j, $j \in \bar{J}$ , into the sub‐classes $U_{j}, V_{j} ∖ U_{j}$ . Clearly, $# W \leq 2 # V$ . Furthermore,

$\begin{array}{l} {ind}_{μ} (V) = \frac{1}{| Ω |} \sum_{ω \in Ω} E {〈 Var [σ [ω | x] | ℱ_{V}] 〉}_{μ} \\ = \frac{1}{| Ω |} \sum_{ω \in Ω} (E {〈 Var [σ [ω | x] | ℱ_{W}] 〉}_{μ} + E {〈 Var [E [σ [ω | x] | ℱ_{W}] | ℱ_{V}] 〉}_{μ}) \\ = {ind}_{μ} (W) + \frac{1}{| Ω |} \sum_{ω \in Ω} E {〈 Var [E [σ [ω | x] | ℱ_{W}] | ℱ_{V}] 〉}_{μ} . \end{array}$ (2.7)

If $j \in \bar{J}$ then (2.6) implies that on V _j we have

${〈 Var [E [σ [ω_{j} | x] | ℱ_{W}] | ℱ_{V}] 〉}_{μ} \geq \frac{| U_{j} |}{| V_{j} |} {〈 {(σ [ω_{j} | U_{j}] - σ [ω_{j} | V_{j}])}^{2} 〉}_{μ} \geq \frac{ε^{3}}{4 | Ω |^{2}} .$ (2.8)

Hence, combining (2.5) and (2.8), we find

$\frac{1}{| Ω |} \sum_{ω \in Ω} E {〈 Var [E [σ [ω | x] | F_{W}] | F_{V}] 〉}_{μ} \geq \frac{ε^{4}}{4 | Ω |^{3}} .$ (2.9)

Finally, the assertion follows from (2.7) and (2.9).

Proof of Theorem 2.1

The set $P (Ω)$ is compact. Therefore, there exists a partition $Q = (Q_{1}, \dots, Q_{K})$ of $P (Ω)$ into pairwise disjoint sets such that for all $i \in [K]$ and any two measures $μ, μ' \in Q_{i}$ we have ${‖ μ - μ' ‖}_{TV} < ε$ .

Given any partition $W$ of $[n]$ , we can construct a corresponding decomposition $S (W)$ of $Ω^{n}$ as follows. Call $σ, σ' \in Ω^{n}$ $W$ ‐equivalent if for every $i \in [# W]$ there exists $j \in [# Q]$ such that $σ [\cdot | W_{i}], σ' [\cdot | W_{i}] \in Q_{j}$ . Then $S (W)$ comprises of the equivalence classes.

We construct the desired partition $V$ of $[n]$ inductively, starting from any given partition $V (0)$ of size at most $1 / ε$ . The construction stops once μ is $ε$ ‐homogeneous with respect to $(V (t), S (V (t)))$ . Assuming that this is not the case, we obtain $V (t + 1)$ from $V (t)$ as follows. If μ fails to be $ε$ ‐regular with respect to $V (t)$ , then we let $V (t + 1)$ be the partition promised by Lemma 2.7, which guarantees that

$# V (t + 1) \leq 2 # V (t) and {ind}_{μ} (V (t + 1)) \leq {ind}_{μ} (V (t)) - ε^{4} / (4 | Ω |^{3}) .$ (2.10)

Otherwise let $S (t) = S (V (t))$ and $s (t) = # S (t)$ for the sake of brevity. Further, let $μ_{i, t} = μ [\cdot | S_{i} (t)]$ for $i \in [s (t)]$ with $μ [S_{i} (t)] > 0$ . Moreover, let $\bar{I} (t)$ be the set of all $i \in [s (t)]$ such that $μ [S_{i} (t)] > 0$ and $μ_{i, t}$ fails to be $ε$ ‐regular with respect to $V (t)$ . If μ fails to be $ε$ ‐homogeneous with respect to $(V (t), S (t))$ but μ is $ε$ ‐regular w.r.t. $V (t)$ , then

$\sum_{i \in \bar{I} (t)} μ [S_{i} (t)] \geq ε .$ (2.11)

Lemma 2.7 shows that for any $i \in \bar{I} (t)$ there exists a refinement $W (t, i)$ of $V (t)$ such that

${ind}_{μ_{i, t}} (W (t, i)) \leq {ind}_{μ_{i, t}} (V (t)) - ε^{4} / (4 | Ω |^{3}) .$ (2.12)

Let $V (t + 1)$ be the coarsest common refinement of all the partitions ${(W (t, i))}_{i \in \bar{I} (t)}$ . Then

$# V (t + 1) \leq # V (t) \cdot 2^{# Q^{# V (t)}} .$ (2.13)

In addition, (2.12) and Lemma 2.6 imply

${ind}_{μ_{i, t}} (V (t + 1)) \leq {ind}_{μ_{i, t}} (V (t)) - 1 {i \in \bar{I} (t)} ε^{4} / (4 | Ω |^{3}) .$ (2.14)

Therefore, by (2.11), (2.14) and Bayes’ rule

$\begin{array}{l} {ind}_{μ} (V (t + 1)) = \frac{1}{n | Ω |} \sum_{ω \in Ω} \sum_{j \in [# V (t + 1)]} \sum_{x \in V_{j} (t + 1)} {〈 {(σ [ω | x] - σ [ω | V_{j} (t + 1)])}^{2} 〉}_{μ} \\ = \frac{1}{n | Ω |} \sum_{ω, j, x} \sum_{i \in [s (t)] : μ [S_{i} (t)] > 0} μ [S_{i} (t)] {〈 {(σ [ω | x] - σ [ω | V_{j} (t + 1)])}^{2} 〉}_{μ_{i, t}} \\ = \sum_{i : μ [S_{i} (t)] > 0} μ [S_{i} (t)] {ind}_{μ_{i, t}} (V (t + 1)) \\ \leq - ε^{5} / (4 | Ω |^{3}) + \sum_{i : μ [S_{i} (t)] > 0} μ [S_{i} (t)] {ind}_{μ_{i, t}} (V (t)) \\ = {ind}_{μ} (V (t)) - ε^{5} / (4 | Ω |^{3}) . \end{array}$ (2.15)

Combining (2.10), (2.15) and Lemma 2.6, we conclude that μ is $ε$ ‐homogeneous with respect to $(V (T), S (T))$ for some $T \leq 4 | Ω |^{3} / ε^{5}$ . Finally, (2.13) entails that $# V (T), # S (T)$ are bounded in terms of $ε, Ω$ only.

2.4. Proof of Corollary 2.2

To derive Corollary 2.2 from Theorem 2.1 we use the following handy sufficient condition for $(ε, k)$ ‐symmetry.

Lemma 2.8

For any $k \geq 2, ε > 0$ there is $δ = δ (ε, k, Ω)$ such that for large enough n the following is true. Assume that $μ \in P (Ω^{n})$ is δ‐regular with respect to a partition $V$ and set ${\bar{μ}}_{i} (\cdot) = {〈 σ [\cdot | V_{i}] 〉}_{μ}$ for $i \in [# V]$ . If

$\sum_{i \in [# \vec{V}]} \frac{| V_{i} |}{n} {〈 {‖ σ [\cdot | V_{i}] - {\bar{μ}}_{i} ‖}_{TV} 〉}_{μ} < δ, \frac{π}{2}$ (2.16)

then μ is $(ε, k)$ ‐symmetric.

Choose a small $ξ = ξ (ε, k, Ω) > 0$ and a smaller $δ = δ (ξ) > 0$ . Then (2.16) implies that there is $J \subset [# V]$ satisfying

$\sum_{j \in J} | V_{j} | \geq (1 - ξ) n$ (2.17)

such that for all $j \in J, U \subset V_{j}, | U | \geq ξ | V_{j} |$ we have

${〈 {‖ σ [\cdot | U] - {\bar{μ}}_{j} ‖}_{TV} 〉}_{μ} \leq ξ .$ (2.18)

In particular, we claim that (2.18) implies the following (if ξ is small enough):

$\begin{array}{l} \forall ω \in Ω, j \in J, Σ \subset Ω^{n} : μ (Σ) \geq ξ^{1 / 4} \Rightarrow | {x \in V_{j} : | {〈 σ [ω | x] | Σ 〉}_{μ} - {\bar{μ}}_{j} (ω) | > ξ^{1 / 4}} | \\ \leq ξ^{1 / 4} | V_{j} | . \end{array}$ (2.19)

Indeed, assume that ${〈 1 {σ \in Σ} 〉}_{μ} \geq ξ^{1 / 4}$ and $| {x \in V_{j} : | {〈 σ [ω_{0} | x] | Σ 〉}_{μ} - {\bar{μ}}_{j} (ω_{0}) > ξ^{1 / 4}} | | > ξ^{1 / 4} | V_{j} |$ for some $ω_{0} \in Ω$ . Then because ${〈 σ [\cdot | x] | Σ 〉}_{μ}$ is a probability measure on Ω for every x, there exists $ω \in Ω$ such that the set $U = {x \in V_{j} : {〈 σ [ω | x] | Σ 〉}_{μ} < {\bar{μ}}_{j} (ω) - ξ^{1 / 4} / | Ω |}$ has size $| U | > ξ^{1 / 4} | V_{j} | / (2 | Ω |)$ . In particular, ${〈 σ [ω | U] | Σ 〉}_{μ} \leq {\bar{μ}}_{j} (ω) - ξ^{1 / 4} / | Ω | .$ Therefore, by Markov's inequality

${〈 1 {σ [ω | U] \geq {\bar{μ}}_{j} (ω) - ξ^{1 / 3}} | Σ 〉}_{μ} \leq \frac{{\bar{μ}}_{j} (ω) - ξ^{1 / 4} / | Ω |}{{\bar{μ}}_{j} (ω) - ξ^{1 / 3}} \leq \frac{1 - ξ^{1 / 4} / | Ω |}{1 - ξ^{1 / 3}} \leq 1 - ξ^{1 / 4} / (2 | Ω |) .$

Consequently, we obtain

${〈 {‖ σ [\cdot | U] - {\bar{μ}}_{j} ‖}_{TV} 〉}_{μ} \geq ξ^{1 / 3 + 1 / 4} {〈 1 {σ \in Σ} 〉}_{μ} / (2 | Ω |) \geq ξ^{7 / 8} .$

Since $| U | > ξ^{1 / 4} | V_{j} | / (2 | Ω |) > ξ | V_{j} |$ , this is a contradiction to (2.18).

Now, fix any $ω_{1}, \dots, ω_{k} \in Ω$ and let $x_{1}, \dots, x_{k} \in [n]$ be chosen independently and uniformly at random. Let $Σ_{h} = Σ_{h} (x_{1}, \dots, x_{h}) \subset Ω^{n}$ be the event that $σ (x_{i}) = ω_{i}$ for all $i \leq h$ . We are going to show that for $0 \leq h < k$ ,

$E [μ (Σ_{h}) | {〈 σ [ω_{h + 1} | x_{h + 1}] | Σ_{h} 〉}_{μ} - {〈 σ [ω_{h + 1} | x_{h + 1}] 〉}_{μ} |] < ξ^{1 / 5} .$ (2.20)

In the case h = 0 there is nothing to show. As for the inductive step, condition on $x_{1}, \dots, x_{h}$ .

1
Case 1: $μ (Σ_{h}) \leq ξ^{1 / 4}$ : regardless of the choice of ${\vec{x}}_{h + 1}$ we have
$μ (Σ_{h}) | {〈 σ [ω_{h + 1} | x_{h + 1}] | Σ_{h} 〉}_{μ} - {〈 σ [ω_{h + 1} | x_{h + 1} 〉}_{μ} | \leq ξ^{1 / 4} .$

2
Case 2:
$μ (Σ_{h}) > ξ^{1 / 4}$
: due to (2.17) with probability at least $1 - 2 ξ$ we have $x_{h + 1} \in V_{j} ∖ {x_{1}, \dots, x_{h}}$ for some $j \in J$ . Hence, (2.19) implies $E_{{\vec{x}}_{h + 1}} [| {〈 σ [ω_{h + 1} | x_{h + 1}] | Σ_{h} 〉}_{μ} - {〈 σ [ω_{h + 1} | x_{h + 1}] 〉}_{μ} |] \leq ξ^{1 / 4} .$

Hence, (2.20) follows.

To complete the proof, we are going to show by induction on $h \in [k]$ that

$E | {〈 \prod_{i = 1}^{h} σ [ω_{i} | x_{i}] 〉}_{μ} - \prod_{i = 1}^{h} {〈 σ [ω_{i} | x_{i}] 〉}_{μ} | \leq h ξ^{1 / 5} .$ (2.21)

For h = 1 there is nothing to show. To proceed from h to h + 1 we use the triangle inequality to write

$E [| {〈 \prod_{i = 1}^{h + 1} σ [ω_{i} | x_{i}] 〉}_{μ} - \prod_{i = 1}^{h + 1} {〈 σ [ω_{i} | x_{i}] 〉}_{μ} |]$

$\leq E [μ (Σ_{h}) | {〈 σ [ω_{h + 1} | x_{h + 1}] | Σ_{h} 〉}_{μ} - {〈 σ [ω_{h + 1} | x_{h + 1}] 〉}_{μ} |]$

$+ E [{〈 σ [ω_{h + 1} | x_{h + 1}] 〉}_{μ} | {〈 \prod_{i = 1}^{h} σ [ω_{i} | x_{i}] 〉}_{μ} - \prod_{i = 1}^{h} {〈 σ [ω_{i} | x_{i}] 〉}_{μ} |] .$

Invoking the induction hypothesis and (2.20) completes the proof.

Proof of Corollary 2.2

For a small enough $δ = δ (ε, k) > 0$ let $(V, S)$ be a pair of partitions of size at most $N = N (δ, Ω)$ such that μ is $δ / 2$ ‐homogeneous with respect to $(V, S)$ as guaranteed by Theorem 2.1. Let $η = ε / (2 N)$ and let J be the set of all $j \in [# S]$ such that $μ (S_{j}) \geq η$ and such that $μ [\cdot | S_{j}]$ is δ‐regular with respect to $V$ . Then

$\sum_{j \in [# S] ∖ J} μ (S_{j}) \leq δ + ε / 2 < ε .$

Furthermore, for every $j \in J$ the measure $μ [\cdot | S_{j}]$ satisfies (2.16) due to HM2. Therefore, Lemma 2.8 implies that $μ [\cdot | S_{j}]$ is $(ε, k)$ ‐symmetric. Consequently, the sets ${(S_{j})}_{j \in J}$ are pairwise disjoint $(ε, k)$ ‐states with $μ (S_{j}) \geq η$ for all $j \in J$ and $\sum_{j \in J} μ (S_{j}) \geq 1 - ε$ .

2.5. Proof of Corollary 2.3

Pick small enough $δ = δ (ε, k, Ω), γ = γ (δ, ξ), η (γ) > 0$ . Then by Theorem 2.1 μ is γ‐homogeneous with respect to $(V, S)$ for partitions that satisfy $# V + # S \leq N = N (γ)$ . Let $J \subset [# S]$ contain all j such that $μ [\cdot | S_{j}]$ is γ‐regular with respect to $V$ and such that $μ (S_{j}) \geq η$ . Let ${\bar{μ}}_{i, j} = {〈 σ [\cdot | V_{i}] 〉}_{μ [\cdot | S_{j}]}$ . Then by HM2 for every $j \in J$ we have

\frac{1}{n} \sum_{i \in [# V]} | V_{i} | {〈 {‖ σ [\cdot | V_{i}] - {\bar{μ}}_{i, j} ‖}_{TV} 〉}_{μ [\cdot | S_{j}]} < 3 γ .

Therefore, Lemma 2.8 implies that S _j is a $(ξ, 2)$ ‐state. Consequently, our assumption (2.3) and the triangle inequality entail that for all $j, j' \in J$ ,

\sum_{i \in [# V]} \frac{| V_{i} |}{n} {‖ {〈 σ [\cdot | V_{i}] 〉}_{μ [\cdot | S_{j}]} - {〈 σ [\cdot | V_{i}] 〉}_{μ [\cdot | S_{j'}]} ‖}_{TV} < δ .

(2.22)

Choosing η small, we can ensure that $\sum_{j \in J} μ (S_{j}) \leq δ$ . Therefore, letting ${\bar{μ}}_{i} = {〈 σ [\cdot | V_{i}] 〉}_{μ}$ , we obtain from (2.22)

\begin{array}{l} \sum_{i \in [# V]} \frac{| V_{i} |}{n} {〈 {‖ σ [\cdot | V_{i}] - {\bar{μ}}_{i} ‖}_{TV} 〉}_{μ} \leq δ + \sum_{i \in [# V]} \frac{| V_{i} |}{n} \sum_{j \in J} μ (S_{j}) {〈 {‖ σ [\cdot | V_{i}] - {\bar{μ}}_{i} ‖}_{TV} 〉}_{μ [\cdot | S_{j}]} \\ \leq 2 δ + \sum_{i \in [# V]} \frac{| V_{i} |}{n} \sum_{j \in J} μ (S_{j}) {‖ {〈 σ [\cdot | V_{i}] 〉}_{μ [\cdot | S_{j}]} - {\bar{μ}}_{i} ‖}_{TV} \\ [by H M 2] \\ \leq 5 δ . \end{array}

(2.23)

Since μ is γ‐regular and thus $5 δ$ ‐regular w.r.t. $V$ by HM4, (2.23) and Lemma 2.8 imply that μ is $(ε, k)$ ‐symmetric.

2.6. Proof of Corollary 2.4

Choose $δ = δ (γ, η)$ small enough, assume that $S \subset Ω^{n}$ satisfies $μ (S) \geq η$ and that μ is $(δ, 2)$ ‐symmetric. Assume for contradiction that

\frac{1}{n} \sum_{x \in [n]} {‖ μ_{↓ x} [\cdot | S] - μ_{↓ x} ‖}_{TV} > ε .

(2.24)

Let

Then (2.24) entails that $| W | \geq ε n / 2$ . Therefore, there is $ω \in Ω$ such that $| W_{s} (ω) | \geq ε n / (4 | Ω |)$ for either $s = + 1$ or s = – 1. Let $W' = W_{s} (ω)$ for the sake of brevity. Of course, by the definition of $W'$ ,

{({〈 σ [ω | W'] 〉}_{μ [\cdot | S]} - {〈 σ [ω | W'] 〉}_{μ})}^{2} \geq \frac{ε^{2}}{16 | Ω |^{2}} .

(2.25)

Since μ is $(δ, 2)$ ‐symmetric,

\begin{array}{l} {〈 {(σ [ω | W'] - {〈 τ [ω | W'] 〉}_{μ})}^{2} 〉}_{μ} = \frac{1}{| W' |^{2}} \sum_{x, y \in W'} [{〈 σ [ω | x] σ [ω | y] 〉}_{μ} - {〈 τ [ω | x] 〉}_{μ} {〈 τ [ω | y] 〉}_{μ}] \\ \leq \frac{16 δ | Ω |^{2}}{ε^{2}} . \end{array}

(2.26)

On the other hand we have

{〈 {(σ [ω | W'] - {〈 τ [ω | W'] 〉}_{μ})}^{2} 〉}_{μ} \geq μ (S) {({〈 τ [ω | W'] 〉}_{μ [\cdot | S]} - {〈 τ [ω | W'] 〉}_{μ})}^{2} .

(2.27)

Finally, plugging (2.25) and (2.26) into (2.27), we find $\frac{16 δ | Ω |^{2}}{ε^{2}} \geq \frac{η ε^{2}}{32 | Ω |^{2}}$ , which is a contradiction if δ is small enough.

2.7. Proof of Proposition 2.5

Choose small enough $α = α (ε, Ω), γ = γ (α) > 0, χ = χ (γ) > 0$ and an even smaller $δ = δ (γ, χ) > 0$ and assume that μ is $(δ, 2)$ ‐symmetric. Suppose that μ is χ‐homogeneous with respect to a partition $(V, S)$ such that $# V + # S \leq N = N (γ)$ as promised by Theorem 2.1. Let J be the set of all $j \in [# S]$ such that $μ (S_{j}) \geq γ^{2} / N$ . Moreover, let I be the set of all $i \in [# V]$ such that μ is χ‐regular on V _i and $| V_{i} | \geq γ n / N$ . By Corollary 2.4 we have

\frac{1}{| V_{i} |} \sum_{x \in V_{i}} {‖ μ_{↓ x} [\cdot | S_{j}] - μ_{↓ x} [\cdot] ‖}_{TV} < γ for all i \in I, j \in J,

provided that δ is chosen small enough. Therefore, letting ${\bar{μ}}_{i} = {〈 σ [\cdot | V_{i}] 〉}_{μ}$ , for all $i \in I$ we have

{〈 {‖ σ [\cdot | V_{i}] - {\bar{μ}}_{i} ‖}_{TV} 〉}_{μ} < 2 γ .

(2.28)

Fix some $i \in I$ . We claim that $μ \otimes μ$ is α‐regular on V _i. Hence, let $U \subset V_{i}$ be a set of size $| U | \geq α | V_{i} |$ and let

ℰ = {{‖ σ [\cdot | U] - {\bar{μ}}_{i} ‖}_{TV} \leq γ^{1 / 3}} .

Then (2.28) implies that ${〈 1 {σ \in ℰ} 〉}_{μ} < γ^{1 / 3}$ , because μ is γ‐regular on V _i. Now, fix some $σ \in ℰ$ . For $ω \in Ω$ let $U (σ, ω) = {x \in U : σ (x) = ω}$ . Let

ℰ' (σ, ω) = {{‖ τ [\cdot | U (σ, ω)] - {\bar{μ}}_{i} ‖}_{TV} \leq γ^{1 / 3}} .

If $| U (σ, ω) | \geq γ^{1 / 2} | U |$ , then due to (2.28) and γ‐regularity we obtain, by a similar token as previously, ${〈 1 {τ \notin ℰ' (σ, ω)} 〉}_{μ} \leq γ^{1 / 3}$ . Consequently, the event $ℰ' (σ)$ that $ℰ' (σ, ω)$ occurs for all ω satisfying $| U (σ, ω) | \geq γ^{1 / 2} | U |$ has probability at least $1 - | Ω | γ^{1 / 3}$ . Therefore, for any $ω, ω' \in Ω$ we obtain

{〈 | \frac{1}{| U |} \sum_{x \in U} 1 {σ (x) = ω} 1 {τ (x) = ω'} - μ_{i} (ω) μ_{i} (ω') | 〉}_{μ}

\leq (| Ω | + 1) γ^{1 / 3} + 〈 | \frac{1}{| U |} \sum_{x \in U} 1 {σ (x) = ω} 1 {τ (x) = ω'} - μ_{i} (ω) μ_{i} (ω') |

| σ \in ℰ, τ \in ℰ' (σ) 〉 μ

\leq γ^{1 / 7} + {〈 \max_{ω : | U (σ, ω) | \geq γ^{1 / 2} | U |} | τ [ω' | U (σ, ω)] - μ_{i} (ω') | | σ \in ℰ, τ \in ℰ' (σ) 〉}_{μ} \leq γ^{1 / 8} .

Summing over all $ω, ω'$ and choosing γ small enough, we conclude that $μ \otimes μ$ is α‐regular on V _i.

Finally, (2.28) implies that $μ \otimes μ$ satisfies

{〈 {‖ (σ \otimes τ) [\cdot | V_{i}] - {\bar{μ}}_{i} \otimes {\bar{μ}}_{i} ‖}_{TV} 〉}_{μ \otimes μ} < α .

Therefore, picking α small enough, we can apply Lemma 2.8 to conclude that $μ \otimes μ$ is $(ε, 2)$ ‐symmetric.

3. FACTOR GRAPHS

3.1. Examples

The aim in this section is to set up a comprehensive framework for the study of “random factor graphs” and their corresponding Gibbs measures. To get started let us ponder a few concrete examples.

In the Ising model on a graph $G = (V, E)$ the variables of the problem are just the vertices of the graph. The values available for each variable are ±1. Thus, an assignment is simply a map $σ : V \to {\pm 1}$ . Moreover, each edge of G gives rise to a constraint. Specifically, given a parameter $β > 0$ we define a weight function ψ _e corresponding to the edge $e = {v, w}$ by letting $ψ_{e} (σ) = \exp (β σ (v) σ (w)) .$ Thus, edges $e = {v, w}$ give larger weight to assignments σ such that $σ (v) = σ (w)$ than in the case $σ (v) \neq σ (w)$ . The corresponding partition function reads

Z_{β} (G) = \sum_{σ : V \to {\pm 1}} \prod_{e \in E} ψ_{e} (σ) = \sum_{σ : V \to {\pm 1}} \exp [β \sum_{{v, w} \in E} σ (v) σ (w)] .

Further, the Gibbs distribution $μ_{G, β}$ induced by G, β is the probability measure on ${\pm 1}^{V}$ defined by

μ_{G, β} (σ) = \frac{1}{Z_{β} (G)} \prod_{e \in E} ψ_{e} (σ) = \frac{1}{Z_{β} (G)} \exp [β \sum_{{v, w} \in E} σ (v) σ (w)] .

Thus, $μ_{G, β}$ weighs assignments according to the number of edges $e = {v, w}$ such that $σ (v) = σ (w)$ .

The Ising model has been studied extensively in the mathematical physics literature on various classes of graphs, including and particularly random graphs. For instance, if $G (n, d)$ is a random regular graph of degree d on n vertices, then $Z_{β} (G (n, d))$ is known to “converge” to the value predicted by the cavity method 22. Formally, the cavity method yields a certain number $F (β, d)$ such that

\lim_{n \to \infty} \frac{1}{n} E [\ln Z_{β} (G (n, d))] = F (β, d) .

(3.1)

Because $Z_{β} (G (n, d))$ is exponential in n with high probability, the scaling applied in (3.1) is the appropriate one to obtain a finite limit. Furthermore, by Azuma's inequality $\ln Z_{β} (G (n, d))$ is concentrated about its expectation. Therefore, (3.1) implies that $\frac{1}{n} \ln Z_{β} (G (n, d))$ converges to $F (β, d)$ in probability.

The Potts antiferromagnet on a graph $G = (V, E)$ can be viewed as a twist on the Ising model. In this case we look at assignments $σ : V \to [k]$ for some number $k \geq 3$ . The weight functions associated with the edges are defined by $ψ_{e} (σ) = \exp (- β 1 {σ (v) = σ (w)})$ for some $β > 0$ . Thus, this time the edges prefer that the incident vertices receive different values. The Gibbs measure and the partition function read

μ_{G, β} (σ) = \frac{1}{Z_{β} (G)} \exp [- β \sum_{{v, w} \in E} 1 {σ (v) = σ (w)}],

Z_{β} (G) = \sum_{σ : V \to [k]} \exp [- β \sum_{{v, w} \in E} 1 {σ (v) = σ (w)}] .

While it is known that $\lim_{n \to \infty} \frac{1}{n} E [\ln Z_{β} (G (n, d))]$ exists and that $\ln Z_{β} (G (n, d))$ is concentrated about its expectation 15, the precise value remains elusive for a wide range of $d, β$ (in contrast ferromagnetic version of the model 24). However, it is not difficult to see that for sufficiently large values of $d, β$ we have 12

\lim_{n \to \infty} \frac{1}{n} E [\ln Z_{β} (G (n, d))] < \lim_{n \to \infty} \frac{1}{n} \ln E [Z_{β} (G (n, d))] .

Hence, just like in the random k‐SAT model the first moment overshoots the actual value of the partition function by an exponential factor. The Potts model is closely related to the k‐colorability problem. Indeed, if we think of the k possible values as colors, then for large β the Gibbs measure concentrates on colorings with few monochromatic edges.

As a third example let us consider the following version of the random k‐SAT model. Let $k \geq 3, Δ > 1$ be fixed integers, let $V_{n} = {x_{1}, \dots, x_{n}}$ be a set of Boolean variables and let $d_{n} : V_{n} \times {\pm 1} \to [Δ]$ be a map such that

m = \sum_{x \in V_{n}} (d_{n} (x, 1) + d_{n} (x, - 1)) / k

is an integer. Then we let $Φ (n, k, d_{n})$ be a random k‐CNF formula with m clauses in which each variable $x \in V_{n}$ appears precisely $d_{n} (x, 1)$ times as a positive literal and precisely $d_{n} (x, - 1)$ times as a negative literal. As in Section 1, for a clause a and a truth assignment $σ : V \to {0, 1}$ we let $ψ_{a} (σ) = \exp (- β 1 {σ violates a}) .$ Then for a given parameter $β > 0$ we obtain a Gibbs measure that weighs assignments by the number of clauses that they violate and a corresponding partition function $Z_{β} (Φ (n, k, d_{n}))$ , cf. ((1.1), (1.2)). Hence, for given $β > 0, k \geq 3$ and degree assignments ${(d_{n})}_{n}$ the problem of determining $\lim_{n \to \infty} \frac{1}{n} E [\ln Z_{β} (Φ (n, k, d_{n}))]$ arises. This question is anything but straightforward even in the special case that $d_{n} (x, \pm 1) = d_{0}$ is the same for all x. In 11 we show how the results of the present paper can be put to work to tackle this case.

3.2. Random Factor Graphs

The following definition encompasses a variety of concrete models.

Definition 3.1

Let $Δ > 0$ be an integer, let $Ω, Θ$ be finite sets and let $Ψ = {ψ_{1}, \dots, ψ_{l}}$ be a finite set of functions $ψ_{i} : Ω^{h_{i}} \to (0, \infty)$ of arity $h_{i} \in [Δ]$ . A $(Δ, Ω, Ψ, Θ)$ ‐model $M = (V, F, d, t, {(ψ_{a})}_{a \in F})$ consists of

1
M1: a countable set V of variable nodes,

2
M2: a countable set F of constraint nodes,

3
M3: a map $d : V \cup F \to [Δ]$ such that
$\sum_{x \in V} d (x) = \sum_{a \in F} d (a),$ (3.2)

4
M4: a map $t : C_{V} \cup C_{F} \to Θ$ , where we let
$C_{V} = \underset{x \in V}{\cup} {x} \times [d (x)], C_{F} = \underset{a \in F}{\cup} {a} \times [d (a)],$
such that
$| t^{- 1} (θ) \cap C_{V} | = | t^{- 1} (θ) \cap C_{F} | f o r e a c h θ \in Θ,$ (3.3)

5
M5: a map $F \to Ψ, a \mapsto ψ_{a}$ such that $ψ_{a} : Ω^{d (a)} \to (0, \infty)$ for all $a \in F$

The size of the model is $# M = | V |$ . Furthermore, a $M$ ‐factor graph is a bijection $G : C_{V} \to C_{F}, (x, i) \mapsto G (x, i)$ such that $t (G (x, i)) = t (x, i)$ for all $(x, i) \in C_{V}$

Of course, (3.2) and (3.3) require that either both quantities are infinite or both are finite.

The semantics is that Δ is the maximum degree of a factor graph. Moreover, Ω is the set of possible values that the variables of the model range over, e.g., the set ${\pm 1}$ in the Ising model. Further, Θ is a set of “types”. For instance, in the random k‐SAT model the types can be used to specify the signs of the literals. Additionally, $Ψ$ is a set of possible weight functions.

A model $M$ comes with a set V of variable nodes and a set F of contraint nodes. The degrees of these nodes are prescribed by the map d. Just like in the “configuration model” of graphs with a given degree sequence we create d(v) “clones” of each node v. The sets C_V, C _F contain the clones of the variable and constraint nodes, respectively. Further, the map t assigns a type to each “clone” of either a constraint or variable node and each constraint node a comes with a weight function ψ _a.

A $M$ ‐factor graph is a type‐preserving matching G of the variable and constraint clones. Let $G (M)$ be the set of all $M$ ‐factor graphs and write $G = G (M)$ for a uniformly random sample from $G (M)$ . Contracting the clones of each node, we obtain a bipartite (multi‐)graph with variable nodes V and constraint nodes F. We often identify $G$ with this multi‐graph. For instance, if we speak of the distance of two vertices in $G$ we mean the length of a shortest path in this multi‐graph.

For a clone $(x, i) \in C_{V}$ we denote by $\partial (G, x, i) = G (x, i)$ the clone that G matches (x, i) to. Similarly, for $(a, j) \in C_{F}$ we write $\partial (G, a, j)$ for the variable clone (x, i) such that $\partial (G, x, i) = (a, j)$ . Moreover, for a variable x we let $\partial (G, x) = {\partial (G, x, i) : i \in [d (x)]}$ and analogously for $a \in F$ we set $\partial (G, a) = {\partial (G, a, j) : j \in [d (a)]}$ . To economise notation we sometimes identify a clone (x, i) with the underlying variable x. For instance, if $σ : V \to Ω$ is an assignment, then we take the liberty of writing $σ (x, i) = σ (x)$ . Additionally, where convenient we view $\partial (G, x)$ as the set of all constraint nodes $a \in F$ such that there exist $i \in [d (x)], j \in [d (a)]$ such that $(a, j) = G (x, i)$ . The corresponding convention applies to $\partial (G, a)$ .

A $M$ ‐assignment is a map $σ : V \to Ω$ and we define

ψ_{G, a} (σ) = ψ_{a} (σ (\partial_{G} (a, 1)), \dots, σ (\partial_{G} (a, d (a)))) for a \in F, and

ψ_{G} (σ) = \prod_{a \in F} ψ_{a} (σ) .

Further, the Gibbs distribution and the partition function of G are

μ_{G} (σ) = ψ_{G} (σ) / Z_{G}, where Z (G) = \sum_{σ : V \to Ω} ψ_{G} (σ) .

(3.4)

We denote expectations with respect to the Gibbs measure by ${〈 \cdot 〉}_{G} = {〈 \cdot 〉}_{μ_{G}}$ .

The fundamental problem that arises is the study of the random variable $\ln Z (G)$ . As mentioned in Section 1, this random variable holds the key to getting a handle the Gibbs measure and thus the combinatorics of the problem. The following proposition establishes concentration about the expectation. For two factor graphs $G, G' \in G (M)$ let

dist (G, G') = | {(x, i) \in C_{V} : \partial (G, x, i) \neq \partial (G', x, i)} | .

(3.5)

Proposition 3.2

For any

$Δ, Ω, Θ, Ψ$

there exists

$η = η (Δ, Ω, Θ, Ψ) > 0$

such that for any

$(Δ, Ω, Ψ, Θ)$

‐model

$M$

of size

$n = # M \geq 1 / η$

and any

$ε > 0$

we have

$P [| \ln Z (G) - E [\ln Z (G)] | > ε n] \leq \exp (- η ε^{2} n)$

There exists a number $ρ > 0$ that depends on $Δ, Ω, Ψ, Θ$ only such that for any two factor graphs $G, G' \in G (M)$ we have $| \ln Z (G) - \ln Z (G') | \leq ρ \cdot dist (G, G')$ . Therefore, the assertion follows from Azuma's inequality.

Thus, Proposition 3.2 reduces our task to calculating the expectation $E [\ln Z (G)]$ . Generally, the standard first and second moment method do not suffice to tackle this problem because the logarithm sits inside the expectation. While, of course, Jensen's inequality guarantees that

E [\ln Z (G)] \leq \ln E [Z (G)],

(3.6)

equality does not typically hold. In fact, we saw examples where $\ln E [Z (G)] - E [\ln Z (G)]$ is linear in the size $# M$ of the model already. If so, then the Paley‐Zygmund inequality (1.3) entails that $\ln (E [Z {(G)}^{2}] / E {[Z (G)]}^{2})$ is linear in $# M$ as well, dooming the second moment method. Furthermore, even if $E [\ln Z (G)] \sim \ln E [Z (G)]$ the second moment method does not generally succeed 20. Let us now revisit the examples from Section 3.1.

Example 3.3

((the Ising model on the random d‐regular graph)). Suppose that $d \geq 2, β > 0$ . Let $Δ = d, Ω = {\pm 1}, Ψ = {ψ}$ , where $ψ : {\pm 1}^{2} \to (0, \infty), (σ_{1}, σ_{2}) \mapsto \exp (β σ_{1} σ_{2})$ , and set $Θ = {0}$ . Further, given $n \geq 1$ such that dn is even we define a $(Δ, Ω, Ψ, Θ)$ ‐model $M (d, n)$ by letting $V = {x_{1}, \dots, x_{n}}, F = {a_{1}, \dots, a_{d n / 2}}$ , d(x) = d for all $x \in V$ , d(a) = 2 for all $a \in F$ , $t (x, i) = t (f, j) = 0$ for all $(x, i) \in C_{V}, (f, j) \in C_{F}$ , and $ψ_{a} = ψ$ for all $a \in F$ . Thus, all clones have the same “type” and all constraint nodes have arity two and the same weight function. Hence, the random graph $G (M)$ is obtained by matching the dn variable clones randomly to the dn constraint clones. If we simply replace the constraint nodes, which have degree two, by edges joining the two adjacent variable nodes, then the resulting random multigraph is contiguous to the uniformly random d‐regular graph on n vertices. In the model $M$ (3.6) holds with (asymptotic) equality for all $d, β$ 22.

Example 3.4

((the Potts antiferromagnet on the random d‐regular graph)). The construction is similar to the previous example, except that $Ω = [k]$ is the set of colors and $ψ (σ_{1}, σ_{2}) = \exp (- β 1 {σ_{1} = σ_{2}})$ . In this example (3.6) holds with asymptotic equality if either $d \leq d_{0} (k)$ or $d > d_{0} (k)$ and $β \leq β_{0} (d, k)$ for certain critical values $d_{0} (k), β_{0} (d, k)$ . However, for sufficiently large $d, β$ there occurs a linear gap 12, 21.

Example 3.5

((random k‐SAT)). To capture the random k‐SAT model we let $Δ > 0$ be a maximum degree and $Ω = Θ = {\pm 1}$ . Further, each $s \in {\pm 1}^{k}$ gives rise to a function

$ψ_{s} : {\pm 1}^{k} \to (0, \infty), σ \mapsto \exp (- β 1 {σ = - s})$

and we let $Ψ = {ψ_{s} : s \in {\pm 1}^{k}}$ . The idea is that s is the “sign pattern” of a k‐clause, with $s_{i} = \pm 1$ indicating that the ith literal is positive/negative. Then a truth assignment σ of the k variables is satisfying unless $σ_{i} = - s_{i}$ for all i. The corresponding model $M$ has a set $V = {x_{1}, \dots, x_{n}}$ of Boolean variables and a set $F = {a_{1}, \dots, a_{m}}$ of clauses. Moreover, the map $d : V \to [Δ]$ prescribes the degree of each variable, while of course each clause has degree k. Additionally, the map $t : C_{V} \cup C_{F} \to Θ = {\pm 1}$ prescribes the positive/negative occurrences of the variables and the sign patterns of the clauses. Thus, a variable x occurs $| {i \in [d (v)] : t (x, i) = \pm 1} |$ times positively/negatively and the jth literal of a clause a is positive iff $t (a, j) = 1$ . Finally, the weight function of clause a is $ψ_{(t (a, 1), \dots, t (a, k))}$ . The bound (3.6) does not generally hold with equality 5, 11.

While Definition 3.1 encompasses many problems of interest, there are two restrictions. First, because all weight functions $ψ \in Ψ$ take strictly positive values, Definition 3.1 does not allow for “hard” constraints. For instance, Definition 3.1 does not accommodate the graph coloring problem, which imposes the strict requirement that no single edge be monochromatic. However, for some purposes hard constraints can be approximated by soft ones, e.g., by choosing a very large value of β in the Potts antiferromagnet. Moreover, many of the arguments in the following sections do extend to hard constraints with a bit of care. However, the assumption that all ψ are strictly positive saves us many case distinctions as it ensures that $Z (G)$ is strictly positive and that therefore the Gibbs measure is well‐defined.

The second restriction is that we prescribe a fixed maximum degree Δ. Thus, if we consider a sequence $\underline{M} = {(M_{n})}_{n}$ of $(Δ, Ω, Ψ, Θ)$ ‐models with $# M_{n} = n$ , then all factor graphs have a bounded degree. By comparison, if we choose a k‐SAT formula with n variables and $m = α n / k$ clauses uniformly at random for fixed $k \geq 3, α > 0$ , then the maximum variable degree will be of order $\ln n / \ln \ln n$ . Yet this case can be approximated well by a sequence of models with a large enough maximum degree Δ. In fact, if we calculate $E [\ln Z]$ for any fixed Δ, then the $Δ \to \infty$ limit is easily seen to yield the answer in the case of uniformly random formulas. Nevertheless, the bounded degree assumption is technically convenient because it facilitates the use of local weak convergence, as we will discuss next.

Remark 3.6

For the sake of simplicity in (3.4) we definied the partition function as the sum over all $σ : V \to Ω$ . However, the results stated in the following carry over to the cases where Z is defined as the sum over all configurations of a subset of $\emptyset \neq C_{M} \subset Ω^{V}$ , e.g., all σ that have Hamming distance at most αn from some reference assignment σ ₀ for a fixed $α > 0$ . Of course, in this case the Gibbs measure is defined such that its support is equal to $C_{M}$ .

3.3. Local Weak Convergence

Suppose that we fix $Δ, Ω, Ψ, Θ$ as in Definition 3.1 and that $\underline{M} = {(M_{n})}_{n}$ is a sequence of $(Δ, Ω, Ψ, Θ)$ ‐models such that $M_{n} = (V_{n}, F_{n}, d_{n}, t_{n}, {(ψ_{a})}_{a \in F_{n}})$ has size n. Let us write $G = G (M_{n})$ for the sake of brevity. According to the cavity method, $\lim_{n \to \infty} \frac{1}{n} E [\ln Z (G)]$ is determined by the “limiting local structure” of the random factor graph $G$ . To formalise this concept, we adapt the concept of local weak convergence of graph sequences 8, 35 to our current setup, thereby generalising the approach taken in 23.

Definition 3.7

A $(Δ, Ω, Ψ, Θ)$ ‐template consists of a $(Δ, Ω, Ψ, Θ)$ ‐model $M$ , a connected factor graph $H \in G (M)$ and a root r_H, which is a variable or factor node. Its size is $# M$ . Moreover, two templates $H, H'$ with models $M = (V, F, d, t, (ψ_{a}))$ , $M' = (V', F', d', t', (ψ_{a}'))$ are isomorphic if there exists a bijection $π : V \cup F \to V' \cup F'$ such that

1
ISM1: $π (r_{H}) = r_{H}'$ ,

2
ISM2: $π (V) = V'$ and $π (F) = F'$ ,

3
ISM3: $d (v) = d' (π (v))$ for all $v \in V \cup F$ ,

4
ISM4: $t (v, i) = t' (π (v), i)$ for all $(v, i) \in C_{V} \cup C_{F}$ ,

5
ISM5: $ψ_{a} = ψ_{π (a)}$ for all $a \in F$ , and

6
ISM6: if $(v, i) \in C_{V}, (a, j) \in C_{F}$ satisfy $\partial (G, x, i) = (a, j)$ , then $\partial (G', π (x), i) = (π (a), j)$

Thus, a template is, basically, a finite or countably infinite connected factor graph with a distinguished root. Moreover, an isomorphism preserves the root as well as degrees, types, weight functions and adjacencies.

Let us write $[H]$ for the isomorphism class of a template and let $G = G (Δ, Ω, Θ, Ψ)$ be the set of all isomorphism classes of $(Δ, Ω, Ψ, Θ)$ ‐templates. For each $[H] \in G$ and $ℓ \geq 1$ let $\partial^{ℓ} [H]$ be the isomorphism class of the template obtained by removing all vertices at a distance greater than $ℓ$ from the root. We endow $G$ with the coarsest topology that makes all the functions

Γ \in G \mapsto 1 {\partial^{ℓ} [Γ] = \partial^{ℓ} [Γ_{0}]} \in {0, 1} for ℓ \geq 1, Γ_{0} \in G

continuous. Moreover, the space $P (G)$ of probability measures on $G$ carries the weak topology. So does the space $P^{2} (G)$ of probability measures on $P (G)$ . For $Γ \in G$ we write $δ_{Γ} \in P (G)$ for the Dirac measure that puts mass one on the single point Γ. Similarly, for $λ \in P (G)$ we let $δ_{λ} \in P^{2} (G)$ be the Dirac measure on λ. Our assumption that the maximum degree is bounded by a fixed number Δ ensures that $G, P (G), P^{2} (G)$ are compact Polish spaces.

For a factor graph $G \in G (M_{n})$ and a variable or constraint node v we write $[G, v]$ for the isomorphism class of the connected component of v in G rooted at v. Then each factor graph $G \in G (M_{n})$ gives rise to the empirical distribution

λ_{G} = \frac{1}{| V_{n} | + | F_{n} |} \sum_{v \in V_{n} \cup F_{n}} δ_{[G, v]} \in P (G) .

We say that $\underline{M}$ converges locally to $ϑ \in P (G)$ if

\lim_{n \to \infty} E [δ_{λ_{G}}] = δ_{ϑ} .

(3.7)

Denote a random isomorphism class chosen from the distribution $ϑ$ by $T = T_{ϑ}$ . Unravelling the definitions, we see that (3.7) holds iff for every integer $ℓ > 0$ and every $[H] \in G$ we have

\frac{1}{| V_{n} | + | F_{n} |} \sum_{v \in V_{n} \cup F_{n}} 1 {\partial^{ℓ} [G, v] = \partial^{ℓ} [H]} \overset{n \to \infty}{\to} P [\partial^{ℓ} T_{ϑ} = \partial^{ℓ} [H]] in probability .

(3.8)

We are going to be interested in the case that $\underline{M}$ converges locally to a distribution $ϑ$ on acyclic templates. Thus, let $T$ be the set of all acyclic templates. Further, we write $V$ for the set of all templates whose root is a variable node and $F$ for the set of all templates whose root is a constraint node. Additionally, for a template $[H]$ we write $r_{[H]}$ for the root vertex, $d_{[H]}$ for its degree and $ψ_{[H]}$ for the weight function of the root vertex if $[H] \in F$ . Moreover, for $j \in [d_{[H]}]$ we write $[H] ↑ j$ for the template obtained from $[H]$ by re‐rooting the template at the jth neighbor of $r_{[H]}$ . (This makes sense because condition ISM6 from Definition 3.7 preserves the order of the neighbors.)

We will frequently condition on the depth‐ $ℓ$ neighborhoods of the random factor graph $G$ for some finite $ℓ$ . Hence, for $G, G' \in G (M_{n})$ and $ℓ \geq 1$ we write $G ≅_{ℓ} G'$ if $\partial^{ℓ} [G, x] = \partial^{ℓ} [G', x]$ for all variable nodes $x \in V_{n}$ and $\partial^{ℓ + 1} [G, a] = \partial^{ℓ + 1} [G', a]$ for all constraint nodes $a \in F_{n}$ . Let $T_{ℓ} = T_{ℓ, M_{n}}$ be the σ‐algebra on $G (M_{n})$ generated by the equivalence classes of the relation $≅_{ℓ}$ . Additionally, for $G \in G (M_{n})$ and $ℓ \geq 0$ we let

λ_{G, ℓ} = \frac{1}{| V_{n} | + | F_{n} |} [\sum_{x \in V_{n}} δ_{\partial^{ℓ} [G, x]} + \sum_{a \in F_{n}} δ_{\partial^{ℓ + 1} [G, a]}]

be the empirical distribution of the depth‐ $ℓ$ neighborhood structure.

Furthermore, let

T_{ℓ} = {\partial^{ℓ} T : T \in T \cap V} \cup {\partial^{ℓ + 1} T : T \in T \cap F} .

Then for a probability measure $ϑ \in P (T)$ we denote by $ϑ_{ℓ}$ the image of $ϑ$ under the map

T \to T_{ℓ}, T \mapsto {\begin{array}{l} \partial^{ℓ} T & if T \in T \cap V, \\ \partial^{ℓ + 1} T & if T \in T \cap F . \end{array}

Because all degrees are bounded by Δ, the set $T_{ℓ}$ is finite for every $ℓ \geq 1$ . Hence, (3.8) entails that $\underline{M}$ converges locally to $ϑ \in P (T)$ iff

\lim_{n \to \infty} E {‖ λ_{G, ℓ} - ϑ_{ℓ} ‖}_{TV} = 0 for every ℓ \geq 1.

(3.9)

3.4. The Planted Distribution

While $G$ is chosen uniformly at random (from the configuration model), we need to consider another distribution that weighs factor graphs by their partition function. Specifically, given $ℓ \geq 0$ let ${\hat{G}}_{ℓ} = {\hat{G}}_{ℓ, M_{n}}$ be a random graph chosen according to the distribution

P [{\hat{G}}_{ℓ} = G] = Z (G) \cdot E [\frac{1 {G = G}}{E [Z | T_{ℓ}]}] (G \in G (M_{n})),

(3.10)

which we call the planted distribution. The definition (3.10) ensures that the distribution of the “depth‐ $ℓ$ neighborhood structure” of ${\hat{G}}_{ℓ}$ coincides with that of $G$ .

Perhaps more intuitively, the planted distribution can be described by the following experiment. First, choose a random factor graph $G$ . Then, given $G$ , choose the factor graph ${\hat{G}}_{ℓ}$ randomly such that a graph $G ≅_{ℓ} G$ comes up with a probability that is proportional to Z(G). Perhaps despite appearances, the planted distribution is reasonably easy to work with in many cases. For instance, it has been employed successfully to study random k‐SAT as well as random graph or hypergraph coloring problems 1, 11, 13, 20, 26.

3.5. Short Cycles

In most cases of interest the random factor graph is unlikely to contain many short cycles, and it will be convenient for us to exploit this fact. Hence, let us call a factor graph G l‐acyclic if it does not contain a cycle of length at most l. We say that the sequence $\underline{M}$ of models has high girth if for any $ℓ, l > 0$ we have

\underset{n \to \infty}{\lim \inf} P [G is l - acyclic] > 0, \underset{n \to \infty}{\lim \inf} P [{\hat{G}}_{ℓ} is l - acyclic] > 0.

(3.11)

Thus, there is a non‐vanishing probability that the random factor graph $G$ is l‐acyclic. Moreover, short cycles do not have too heavy an impact on the partition function as the graph chosen from the planted distribution has a non‐vanishing probability of being l‐acyclic as well.

In the following, we are going to denote the event that a random factor graph is l‐acyclic by $A_{l}$ . Let us highlight the following consequence of the high girth condition and the construction of the planted distribution.

Proposition 3.8

Assume that $\underline{M}$ is a sequence of $(Δ, Ω, Ψ, Θ)$ ‐models of high girth. Let $ℓ \geq 1$ be an integer and suppose that $ℬ$ is an event such that $\lim_{n \to \infty} P [{\hat{G}}_{ℓ} \in ℬ] = 1$ . If b is a real and $l \geq 0$ is an integer such that

$\lim_{n \to \infty} P [\ln E [Z (G) | T_{ℓ}] \geq b n | A_{l}] = 1,$ (3.12)

then $\lim_{n \to \infty} \frac{1}{n} \ln E [1 {ℬ \cap A_{l}} Z (G)] \geq b$

Since $\lim_{n \to \infty} P [{\hat{G}}_{ℓ} \in ℬ] = 1$ the high girth condition (3.11) implies that $\lim_{n \to \infty} P [{\hat{G}}_{ℓ} \in ℬ | A_{l}] = 1$ for every l. Set $ℬ_{l} = A_{l} \cap ℬ$ . Then by the definition (3.10) of the planted distribution,

$1 - o (1) = P [{\hat{G}}_{ℓ} \in ℬ | A_{l}] = \sum_{G \in ℬ_{l}} Z (G) E [\frac{1 {G = G}}{E [Z | T_{ℓ}]} | A_{l}] = E [\frac{1 {G \in ℬ_{l}} Z (G)}{E [Z | T_{ℓ}]} | A_{l}]$

$= E [\frac{E [1 {G \in ℬ_{l}} Z | T_{ℓ}]}{E [Z | T_{ℓ}]} | A_{l}] .$

Consequently, $P [E [1 {G \in ℬ_{l}} Z] | T_{ℓ}] \geq E [Z | T_{ℓ}] / 2 | A_{l}] = 1 - o (1)$ . Hence, (3.12) yields

$P [\ln E [1 {G \in ℬ_{l}} Z] | T_{ℓ}] \geq b n - 1 | A_{l}] = 1 - o (1) .$

Therefore, the assertion follows from (3.11).

Remark 3.9

Strictly speaking, the first condition in (3.11) is superfluous as it is implied by the second one.

From here on out we assume that $\underline{M}$ is a sequence of $(Δ, Ω, Ψ, Θ)$ ‐models of high girth that converges locally to $ϑ \in P (T)$ and we fix $Δ, Ω, Θ, Ψ$ for the rest of the paper.

4. THE BETHE FREE ENERGY

In this section we present the main results of the paper. The thrust is that certain basic properties of the Gibbs measure entail an asymptotic formula for $E [\ln Z (G)]$ . The results are guided by the physics predictions from 34.

4.1. An Educated Guess

The formula for $E [\ln Z (G)]$ that the cavity method predicts, the so‐called “replica symmetric solution”, comes in terms of the distribution $ϑ$ to which $\underline{M}$ converges locally. Thus, the cavity method claims that in order to calculate $E [\ln Z (G)]$ it is not necessary to deal with the mind‐boggling complexity of the random factor graph with its expansion properties, long cycles etc. Instead, it suffices to think about the random tree $T = T_{ϑ}$ , a dramatically simpler object. The following definition will help us formalise this notion.

Definition 4.1

A marginal assignment is a measurable map

$p : T \to \cup_{j = 1}^{Δ} P (Ω^{j}), T \mapsto p_{T}$

such that

1
MA1: $p_{T} \in P (Ω)$ for all $T \in V$ ,

2
MA2: $p_{T} \in P (Ω^{d_{T}})$ and $p_{T ↓ j} = p_{T ↑ j}$ for all $T \in F, j \in [d_{T}]$ ,

3
MA3: For all $T \in F$ we have
$\begin{array}{l} H (p_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{p_{T}} = \max {H (ν) + {〈 \ln ψ_{T} (σ) 〉}_{ν} : ν \in 𝒫 (Ω^{d_{T}}) \\ s . t . ν_{↓ j} = p_{T ↑ j} f o r a l l j \in [d_{T}]} . \end{array}$

(4.1)

Further, the Bethe free energy of p with respect to $ϑ$ is
$ℬ_{ϑ} (p) = E [(1 - d_{T}) H (p_{T}) | V] + \frac{P [T \in F]}{P [T \in V]} E [H (p_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{p_{T}} | F],$ (4.2)
where, of course, $E [\cdot], P [\cdot] E [\cdot], P [\cdot]$ refer to the choice of the random tree $T = T_{ϑ}$

Thus, a marginal assignment provides a probability distribution p _T on Ω for each tree whose root is a variable node. Furthermore, for trees T rooted at a contraint node p _T is a distribution on $Ω^{d_{T}}$ , which we think of as the joint distribution of the variables involved in the constraint. The distributions assigned to T rooted at a constraint node must satisfy a consistency condition: the jth marginal of p _T has to coincide with the distribution assigned to the tree $T ↑ j$ rooted at the jth child of the root of T for every $j \in [d_{T}]$ ; of course, $T ↑ j$ is a tree rooted at a variable node. In addition, MA3 requires that for $T \in F$ the distribution p _T maximises the functional $H (ν) + {〈 ψ_{T} (σ) 〉}_{ν}$ amongst all distribution ν with the same marginal distributions as p _T. Furthermore, the Bethe free energy is a functional that maps each marginal assignment p to a real number. For a detailed derivation of this formula based on physics intuition we refer to 36.

The basic idea behind Definition 4.1 is to capture the limiting distribution of the marginals of the variables of the random factor graph $G$ . More specifically, Definition 4.1 aims to provide the “limiting object” of the following combinatorial construction: for a fixed $ℓ$ take a random factor graph $G$ on a large enough number n of variable nodes and for each possible tree $T \in T_{ℓ}$ record the empirical distribution of the marginals of the nodes whose neighborhood is isomorphic to T. In the simplest possible case (which here we confine ourselves to), in the limit of large $ℓ, n$ we expect to obtain distributions that satisfy MA2. That is, in the limit of large $ℓ$ the empirical distribution of the marginals of variable nodes converges to a deterministic limit; going from $ℓ$ to $ℓ + 1$ (the depth up to which constraint nodes can “see”) does not make much of a difference. Moreover, in the proof of Corollary 4.8 we are going to see that MA3 is the “correct” way of linking the constraint/variable distributions.

Given a distribution $ϑ$ on trees, the cavity method provides a plausible recipe for constructing marginal assignments. Roughly speaking, the idea is to identify fixed points of an operator called Belief Propagation on the random infinite tree 36. However, this procedure is difficult to formalise mathematically because generally there are several Belief Propagation fixed points and model‐dependent considerations are necessary to identify the “correct” one. To keep matters as simple as possible we are therefore going to assume that a marginal assignment is given.

Remark 4.2

Because the entropy is concave, conditions MA2 and MA3 specify the distributions p _T for $T \in F$ uniquely. In other words, a marginal assignment is actually determined completely by the distributions p _T for $T \in V$ .

For a marginal assignment p, an integer $ℓ$ and a tree $T \in T_{ℓ} \cap V$ we define

p_{ℓ, T} = E [p_{T} | \partial^{ℓ} T = T] .

Thus, $p_{ℓ}$ is the conditional expectation of p given the first $ℓ$ layers of the tree. To avoid notational hazards we let $p_{T}, p_{ℓ, T}$ be the uniform distribution on Ω for all $T \in G ∖ T$ .

Lemma 4.3

For any

$ε > 0$

there is

$ℓ_{0} > 0$

such that for all

$ℓ > ℓ_{0}$

we have

$E [{‖ p_{ℓ, \partial^{ℓ} T} - p_{T} ‖}_{TV} | T \in V] < ε .$

Define an equivalence relation $\equiv_{ℓ}$ on $T \cap V$ by letting $T \equiv_{ℓ} T'$ iff $\partial^{ℓ} T = \partial^{ℓ} T'$ . Then for any $ω \in Ω$ the sequence of random variables $X_{ℓ} (T) = p_{ℓ, \partial^{ℓ} T} (ω)$ is a martingale with respect to the filtration generated by the equivalence classes of $\equiv_{ℓ}$ . By the martingale convergence theorem 27, Theorem 5.7], ${(p_{ℓ})}_{ℓ}$ converges $ϑ$ ‐almost surely to p.

Unless specified otherwise, in the rest of this section p is understood to be a marginal assignment.

4.2. Symmetry

In the terminology of Section 2, the cavity method claims that $\frac{1}{n} E [\ln Z (G)]$ converges to the Bethe free energy of a suitable marginal assignment iff

\lim_{n \to \infty} P [μ_{G} is (ε, 2) - symmetric] = 1 for any ε > 0 (see [34]) .

(4.3)

This claim is, of course, based on bold non‐rigorous deliberations. Nonetheless, we aim to prove a rigorous statement that comes reasonably close.

To this end, let p be a marginal assignment. We say that $\underline{M}$ is p‐symmetric if for every $ε > 0$ there is $ℓ_{0} > 0$ such that for all $ℓ > ℓ_{0}$ we have

\lim_{n \to \infty} P [\frac{1}{n^{2}} \sum_{x, y \in V_{n}} {‖ μ_{G ↓ {x, y}} - p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]} ‖}_{TV} > ε] = 0.

(4.4)

In other words, for any $ε > 0$ for $ℓ$ sufficiently large the random factor graph $G$ enjoys the following property with high probability. If we pick two variable nodes x, y of $G$ uniformly and independently, then the joint distribution $μ_{G ↓ {x, y}}$ is close to the product distribution $p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]}$ determined by the depth‐ $ℓ$ neighborhoods of x, y. Of course, as $G$ has bounded maximum degree the distance between randomly chosen x, y is going to be greater than, say, $\ln \ln n$ with high probability. Thus, similar in spirit to (4.3), (4.4) provides that far‐apart variables typically decorrelate and that p captures the Gibbs marginals.

In analogy to (4.4), we say that the planted distribution of $\underline{M}$ is p‐symmetric if for every $ε > 0$ there is $ℓ_{0} > 0$ such that for all $ℓ > ℓ_{0}$ we have

\lim_{n \to \infty} P [\frac{1}{n^{2}} \sum_{x, y \in V_{n}} {‖ μ_{{\hat{G}}_{ℓ} ↓ {x, y}} - p_{ℓ, \partial^{ℓ} [{\hat{G}}_{ℓ}, x]} \otimes p_{ℓ, \partial^{ℓ} [{\hat{G}}_{ℓ}, y]} ‖}_{TV} > ε] = 0 for any ε > 0.

The main result of this paper is

Theorem 4.4

If

$\underline{M}$

is p‐symmetric, then

$\underset{n \to \infty}{\lim \sup} \frac{1}{n} E [\ln Z (G)] \leq ℬ_{ϑ} (p) .$

If the planted distribution of

$\underline{M}$

is p‐symmetric as well, then

$\lim_{n \to \infty} \frac{1}{n} E [\ln Z (G)] = ℬ_{ϑ} (p) .$

Thus, the basic symmetry assumption (4.4) implies that $ℬ_{ϑ} (p)$ is an upper bound on $\frac{1}{n} E [\ln Z (G)]$ . If, additionally, the symmetry condition holds in the planted model, then this upper bound is tight. In particular, in this case $\frac{1}{n} E [\ln Z (G)]$ is completely determined by the limiting local structure $ϑ$ and p.

The proof of Theorem 4.4, which can be found in Section 4.6, is based on Theorem 2.1, the decomposition theorem for probability measures on cubes. More precisely, we combine Theorem 2.1 with a conditional first and a second moment argument given the local structure of the factor graph, i.e., given $T_{ℓ}$ for a large $ℓ$ . The fact that it is necessary to condition on the local structure in order to cope with “lottery effects” has been noticed in prior work 6, 17, 22, 23. Most prominently, such a conditioning was crucial in order to obtain the precise k‐SAT threshold for large enough k 26. But here the key insight is that Theorem 2.1 enables us to carry out conditional moment calculations in a fairly elegant and generic way.

The obvious question that arises from Theorem 4.4 is whether there is a simple way to show that $\underline{M}$ is p‐symmetric (and that the same is true of the planted distribution). In Sections 4.3 and 4.4 we provide two sufficient conditions called non‐reconstruction and Gibbs uniqueness. That these two conditions entail symmetry was predicted in 34, and Theorem 2.1 enables us to prove it.

While the present paper deals with a very general class of factor graphs, the methods give somewhat stronger results in special classes, e.g., models with only one type and variable all nodes of the same degrees or variable nodes with Poisson degrees. The details have been worked out in 18, 19.

4.3. Non‐reconstruction

Following 34 we define a correlation decay condition, the “non‐reconstruction” condition, on factor graphs and show that it implies symmetry. The basic idea is to formalise the following. Given $ε > 0$ pick a large $ℓ = ℓ (ε) > 1$ , choose a random factor graph $G$ for some large n and pick a variable node x uniformly at random. Further, sample an assignment $σ$ randomly from the Gibbs measure $μ_{G}$ . Now, sample a second assignment $τ$ from $μ_{G}$ subject to the condition that $τ (y) = σ (y)$ for all variable nodes y at distance at least $ℓ$ from x. Then the non‐reconstruction condition asks whether the distribution of $τ (x)$ is markedly different from the unconditional marginal $μ_{G ↓ x}$ . More precisely, non‐reconstruction occurs if for any $ε$ there is $ℓ (ε)$ such that with high probability $G$ is such that the shift that a random “boundary condition” $σ$ induces does not exceed $ε$ in total variation distance.

Of course, instead of conditioning on the values of all variables at distance at least $ℓ$ from x, we might as well just condition on the variables at distance either $ℓ$ or $ℓ + 1$ from x, depending on the parity of $ℓ$ . This is immediate from the definition (3.4) of the Gibbs measure.

As for the formal definition, suppose that $G \in G (M_{n})$ is a factor graph, let $x \in V_{n}$ and let $ℓ \geq 1$ . Let $\nabla_{ℓ} (G, x)$ signify the σ‐algebra on $Ω^{n}$ generated by the events $1 {σ (y) = ω}$ for $ω \in Ω$ and $y \in V_{n}$ at distance either $ℓ$ or $ℓ + 1$ from x. Thus, $\nabla_{ℓ} (G, x)$ pins down all $σ (y)$ for y at distance $ℓ$ from x if $ℓ$ is even and $ℓ + 1$ otherwise. Then we say that $\underline{M}$ has non‐reconstruction with respect to a marginal assignment p if for any $ε > 0$ there is $ℓ > 0$ such that

\lim_{n \to \infty} P [\frac{1}{n} \sum_{x \in V_{n}} {〈 {‖ {〈 τ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} 〉}_{G} > ε] = 0.

To parse the above, the outer $P [\cdot]$ refers to the choice of $G$ . The big ${〈 \cdot 〉}_{G}$ is the choice of the boundary condition called $σ$ above. Finally, ${〈 \cdot | \nabla_{ℓ} (G, x) 〉}_{G}$ is the random choice given the boundary condition.

Analogously, the planted distribution of $\underline{M}$ has non‐reconstruction with respect to p if for any $ε > 0$ there exists $ℓ > 0$ such that

\lim_{n \to \infty} P [\frac{1}{n} \sum_{x \in V_{n}} {〈 ‖ {〈 σ [\cdot | x] | \nabla_{ℓ} ({\hat{G}}_{ℓ}, x) 〉}_{{\hat{G}}_{ℓ}} - p_{ℓ, \partial^{ℓ} [{\hat{G}}_{ℓ}, x]} ‖ 〉}_{{\hat{G}}_{ℓ}} > ε] = 0.

Theorem 4.5

If $\underline{M}$ has non‐reconstruction with respect to p, then $\underline{M}$ is p‐symmetric. If the planted distribution of $\underline{M}$ has non‐reconstruction with respect to p, then it is p‐symmetric.

In concrete applications the non‐reconstruction condition is typically reasonably easy to verify. For instance, in 11 we determine the precise location of the so‐called “condensation phase transition” in the regular k‐SAT model via Theorems 4.4 and 4.5. The proof of Theorem 4.5 can be found in Section 4.7.

4.4. Gibbs Uniqueness

Although the non‐reconstruction condition is reasonably handy, to verify it we still need to “touch” the complex random graph $G$ . Ideally, we might hope for a condition that can be stated solely in terms of the limiting distribution $ϑ$ on trees, which is conceptually far more accessible. The “Gibbs uniqueness” condition as put forward in 34 fills this order.

Specifically, suppose that T is a finite acyclic template whose root r _T is a variable node. Then we say that T is $(ε, ℓ)$ ‐unique with respect to a marginal assignment p if

{‖ {〈 σ [\cdot | r_{T}] | \nabla_{ℓ} T 〉}_{T} - p_{T} ‖}_{TV} < ε .

(4.5)

To parse (4.5), we observe that ${〈 σ [\cdot | r_{T}] | \nabla_{ℓ} T 〉}_{T}$ is a random variable, namely the average of the value $σ [\cdot | r_{T}]$ assigned to the root variable under the Gibbs measure μ _T given the values of the variables at distance at least $ℓ$ from r _T. Hence, (4.5) requires that ${〈 σ [\cdot | r_{T}] | \nabla_{ℓ} T 〉}_{T}$ is at total variation distance less than $ε$ for every possible assignment of the variables at distance at least $ℓ$ from r _T, i.e., for every “boundary condition”.

More generally, we say that $T \in T \cap V$ is $(ε, ℓ)$ ‐unique with respect to p if the finite template $\partial^{ℓ + 1} T$ has this property. (That $\partial^{ℓ + 1} T$ is finite follows once more from the fact that all degrees are bounded by Δ.) Further, we call the measure $ϑ \in P (T)$ Gibbs‐unique with respect to p if for any $ε > 0$ we have

\lim_{ℓ \to \infty} P [T is (ε, ℓ) - unique w . r . t . p] = 1.

Corollary 4.6

If

$ϑ \in P (T)$

is Gibbs‐unique with respect to p, then

$\lim_{n \to \infty} \frac{1}{n} E [\ln Z (G)] = ℬ_{ϑ} (p)$

If $ϑ$ is Gibbs‐unique with respect to p, then (3.9) guarantees that $\underline{M}$ has non‐reconstruction with respect to p. Indeed, given $ε > 0, ℓ > 0$ and a graph G let $ℰ (G, ε, ℓ)$ denote the set of vertices $x \in V_{n}$ for which $\partial^{ℓ} [G, x]$ is acyclic and $(ε, ℓ)$ ‐unique. Then we have

$\begin{array}{l} \frac{1}{n} \sum_{x \in V_{n}} {〈 {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} 〉}_{G} \leq \frac{1}{n} \sum_{x \in V_{n}} {‖ {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} ‖}_{\infty} \\ \leq ε + (1 - \frac{| ℰ (\vec{G}, ε, ℓ) |}{n}), \end{array}$

$\leq ε + (1 - \frac{| ℰ (\vec{G}, ε, ℓ) |}{n}),$

and by (3.9) $P [| ℰ (\vec{G}, ε, ℓ) | \leq (1 - ε) n]$ tends to 0 as $n \to \infty$ . Similarly, because the distribution of the depth‐ $ℓ$ neighborhood structure in the planted distribution ${\hat{G}}_{ℓ}$ coincides with $ϑ_{ℓ}$ , Gibbs‐uniqueness implies that the planted model has non‐reconstruction with respect to p as well. Therefore, the assertion follows from Theorems 4.4 and 4.5.

In problems such as the random k‐SAT model, the Ising model or the Potts antiferromagnet that come with an “inverse temperature” parameter $β \geq 0$ , Gibbs uniqueness is always satisfied for sufficiently small values of β. Consequently, Corollary 4.6 shows that the cavity method always yields the correct value of $\lim_{n \to \infty} \frac{1}{n} E [\ln Z (G)]$ in the case of small β, the so‐called “high temperature” case in physics jargon. Furthermore, if the Gibbs uniqueness condition is satisfied then there is a canonical way of constructing the marginal assignment p by means of the Belief Propagation algorithm 36, Chapter 14]. Hence, Corollary 4.6 provides a full comprehensive answer in this case.

4.5. Meet the Expectation

In this section we lay the groundwork for proving Theorem 4.4. In particular, the conditions MA1–MA3 will be used in the proofs of Corollaries 4.8 and 4.9 in this section, which will be vital to the proof of Theorem 4.4 in Section 4.6. To proceed, we need to get a handle on the conditional expectation of Z given $T_{ℓ}$ and for this purpose we need to study the possible empirical distributions of the values assigned to the variables of a concrete factor graph $G \in G (M_{n})$ . Specifically, by a $(G, ℓ)$ ‐marginal sequence we mean a map $q : T_{ℓ} \to \cup_{j = 1}^{Δ} P (Ω^{j}), T \mapsto q_{T}$ such that

1
MS1: $q_{T} \in P (Ω)$ if $T \in V \cap T_{ℓ}$ ,
2
MS2: $q_{T} \in P (Ω^{d_{T}})$ if $T \in F \cap T_{ℓ}$ ,
3
MS3: for all $T \in T_{ℓ} \cap V$ we have
$\sum_{T' \in T_{ℓ} \cap F} \sum_{j \in [d_{T'}]} λ_{G, ℓ} (T') 1 {\partial^{ℓ} [T' ↑ j] = T} (q_{T' ↓ j} - q_{T}) = 0.$ (4.6)

Thus, q assigns each tree $T \in T_{ℓ}$ rooted at a variable node a distribution on Ω and each tree $T \in T_{ℓ}$ rooted at a constraint node a distribution on $Ω^{d_{T}}$ , just like in Definition 4.1. Furthermore, the consistency condition (4.6) provides that for a given T rooted at a variable the average marginal distribution over all $T', j$ such that $\partial^{ℓ} [T' ↑ j] = T$ is equal to q _T. However, in contrast to condition MA2 from Definition 4.1 MS3 does not require this marginalisation to work out for every $T', j$ individually.

Suppose now that $U \subset F_{n}$ is a set of constraint nodes such that $d (a) = d_{0}$ for all $a \in U$ . Then for $σ : V_{n} \to Ω$ we let

σ [(ω_{1}, \dots, ω_{d_{0}}) | U] = \frac{1}{| U |} \sum_{a \in U} \prod_{j = 1}^{d_{0}} 1 {σ (\partial (G, a, j)) = ω_{j})} .

Thus, $σ [\cdot | U] \in 𝒫 (Ω^{d_{0}})$ is the empirical distribution of the sequences ${(σ (\partial (G, a, 1)), \dots, σ (\partial (G, a, d_{0}))) : a \in U}$ . A factor graph G and $σ : V_{n} \to Ω$ induce a $(G, ℓ)$ ‐marginal sequence $q_{G, σ, ℓ}$ canonically, namely the empirical distributions

q_{G, σ, ℓ, T} = σ [\cdot | {x \in V_{n} : \partial^{ℓ} [G, x] = T}] for T \in T_{ℓ} \cap V,

q_{G, σ, ℓ, T} = σ [\cdot | {a \in F_{n} : \partial^{ℓ + 1} [G, a] = T}] for T \in T_{ℓ} \cap ℱ .

Conversely, given a $(G, ℓ)$ ‐marginal sequence q let $Σ (G, ℓ, q, δ)$ be the set of all $σ : V_{n} \to Ω$ such that for all $T \in T_{ℓ} \cap V, T' \in T_{ℓ} \cap F$ we have

{‖ q_{G, σ, ℓ, T} - q_{T} ‖}_{TV} \leq δ, {‖ q_{G, σ, ℓ, T'} - q_{T'} ‖}_{TV} \leq δ .

(4.7)

Moreover, let

Z_{ℓ, q, δ} (G) = Z (G) {〈 1 {σ \in Σ (G, ℓ, q, δ)} 〉}_{G} .

Finally, define

ℬ_{G, ℓ} (q) = ​ \sum_{T \in T_{ℓ} \cap V} ​ (1 - d_{T}) H (q_{T}) λ_{G, ℓ} (T | V) + \frac{| F_{n} |}{| V_{n} |} ​ \sum_{T \in T_{ℓ} \cap F} ​ ​ [H (q_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}}] λ_{G, ℓ} (T | F) .

In Section 5 we are going to prove the following formula for the expectation of $Z_{ℓ, q, δ} (G)$ .

Proposition 4.7

For any

$ε > 0, ℓ > 0$

there is

$δ > 0$

such that for large enough n the following is true. Assume that

$G \in G (M_{n})$

is

$100 ℓ$

‐acyclic and let q be a

$(G, ℓ)$

‐marginal sequence. Then

$| n^{- 1} \ln E [1 {A_{2 ℓ + 5}} Z_{ℓ, q, δ} (G) | G ≅_{ℓ} G] - ℬ_{G, ℓ} (q) | < ε .$

We are going to be particularly interested in the expectation of $Z_{ℓ, q, δ} (G)$ for q “close” to a specific marginal assignment p (in the sense of Definition 4.1). Formally, a $(G, ℓ)$ ‐marginal sequence q is $(ε, ℓ)$ ‐judicious with respect to p if

\sum_{T \in T_{ℓ} \cap V} λ_{G, ℓ} [T | V] {‖ q_{T} - p_{ℓ, T} ‖}_{TV} + \sum_{T \in T_{ℓ} \cap F} \sum_{j \in [d_{T}]} λ_{G, ℓ} [T | F] {‖ q_{T ↓ j} - p_{ℓ, \partial^{ℓ} [T ↑ j]} ‖}_{TV} < ε .

We say that $(G, σ)$ is $(ε, ℓ)$ ‐judicious with respect to p if the empirical distribution $q_{G, σ, ℓ}$ is $(ε, ℓ)$ ‐judicious w.r.t. p.

Let us explain this definition briefly. Suppose we are given a factor graph G and a certain “depth” $ℓ$ . The for an assignment σ we can jot the empirical distribution of the values assigned to the variables x with $\partial^{ℓ} [G, x] ≅ T$ for each tree $T \in T_{ℓ} \cap V$ . The “judicious” condition essentially provides that these empirical distributions are fairly “homogeneous”. That is, if we refine our classification of variable nodes according to the depth‐ $(ℓ + 1)$ structure $\partial^{ℓ + 1} [G, x]$ , then the resulting empirical distributions are close to the coarser ones obtained at level $ℓ$ . Clearly, this condition is closely related to the MA2 condition from Definition 4.1.

Before we prove Theorem 4.4 we deduce two corollaries to Proposition 4.7. The first one gives an upper bound on the “judicious part” of $E [Z (G)]$ . The second one yields a lower bound.

Corollary 4.8

Suppose that p is a marginal assignment. For any $α > 0$ there exist $ε > 0, ℓ > 0$ such that for all $0 < β, γ < ε$ and all $l \geq ℓ$ the following is true. Let $ℒ (γ, l)$ be the event that ${‖ λ_{G, l} - ϑ_{l} ‖}_{TV} < γ$ . Then

$\underset{n \to \infty}{\lim \sup} \frac{1}{n} \ln E [1 {G \in ℒ (γ, l) \cap A_{100 l}} Z (G) {〈 1 {(G, σ) is (β, l) - j u d i c i o u s w . r . t . p} 〉}_{G}]$

$\leq ℬ_{ϑ} (p) + α .$

Pick a small enough $ε = ε (α) > 0$ . By Lemma 4.3 there exists $ℓ$ such that $E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε$ for all $l \geq ℓ$ . Now, fix any $0 < β, γ < ε, l \geq ℓ$ , pick $ξ = ξ (β, l)$ small enough and assume that n is big enough. Let Q(G) be the set of all (G, l)‐marginal sequences that are $(β, l)$ ‐judicious w.r.t. p. Because $T_{l}$ is a finite set, there exists a number $N = N (ξ)$ such that for every factor graph G there is a subset $Q_{*} (G) \subset Q (G)$ of size $| Q_{*} (G) | \leq N$ such that the following is true. If $(G, σ)$ is $(β, l)$ ‐judicious w.r.t. p, then $σ \in \underset{q \in Q_{*} (G)}{\cup} Σ (G, l, q, ξ)$ . Therefore, for all G we have

$Z (G) {〈 1 {(G, σ) is (ε, l) -judicious w . r . t . p} 〉}_{G} \leq N \max_{q \in Q (G)} Z_{ℓ, q, ξ} (G) .$ (4.8)

Proposition 4.7 and (4.8) imply that for ξ small enough and n large enough for any factor graph $G \in A_{100 ℓ}$ there is $q^{G} \in Q (G)$ such that

$\begin{array}{l} n^{- 1} \ln E [1 {A_{100 ℓ}} Z (G) {〈 1 {(G, σ) is (ε, l) -judicious w . r . t . p} 〉}_{G} | G ≅_{ℓ} G] \\ \leq ℬ_{G, l} (q^{G}) + α / 2 \leq α / 2 + \sum_{T \in T_{ℓ} \cap V} (1 - d_{T}) H (q_{T}^{G}) λ_{G} (T | V) \\ + \frac{| F_{n} |}{| V_{n} |} \sum_{T \in T_{ℓ} \cap ℱ} [H (q_{T}^{G}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}^{G}}] λ_{G} (T | ℱ) . \end{array}$ (4.9)

Further, for any $j \in [Δ]$ the function $ν \in P (Ω^{j}) \mapsto H (ν)$ is uniformly continuous because $P (Ω^{j})$ is compact. By the same token, $ν \mapsto {〈 \ln ψ (σ) 〉}_{ν}$ is uniformly continuous for any $ψ \in Ψ$ . Consequently, if $G \in ℒ (γ, l)$ for some $γ < ε$ and ε is chosen small enough, then we obtain

$\sum_{T \in T_{ℓ} \cap V} (1 - d_{T}) H (q_{T}^{G}) λ_{G} (T | V) < E [(1 - d_{T}) H (p_{T}) | V] + α / 4.$ (4.10)

Similarly, because our choice of l ensures that $E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε$ and because of the uniform degree bound Δ, condition MA2 from Definition 4.1 implies that

$E [\sum_{j = 1}^{d_{T}} {‖ p_{l, \partial^{l} [T ↑ j]} - p_{T ↓ j} ‖}_{TV} | F] < ε^{1 / 4} .$

Moreover, for any $ψ \in Ψ$ and any $ν_{1}, \dots, ν_{d_{ψ}} \in P (Ω)$ there is a unique distribution $\hat{ν} \in P (Ω^{d_{ψ}})$ with marginals ${\hat{ν}}_{↓ j} = ν_{j}$ that maximises $H (\hat{ν}) + {〈 ψ (σ) 〉}_{\hat{ν}}$ because the entropy is concanve and the map $μ \mapsto {〈 ψ (σ) 〉}_{μ}$ is linear. In fact, the map $(ν_{1}, \dots, ν_{d_{ψ}}) \mapsto \hat{ν}$ is uniformly continuous. Therefore, MA2 and MA3 show that for $G \in ℒ (γ, l)$ ,

$\frac{| F_{n} |}{| V_{n} |} \sum_{T \in T_{ℓ} \cap F} [H (q_{T}^{G}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}^{G}}] λ_{G} (T | F) < E [H (p_{T}) + {〈 ψ_{T} (σ) 〉}_{p_{T}} | F] + α / 4.$ (4.11)

Combining (4.9), (4.10) and (4.11), we get

$\ln E [1 {A_{100 ℓ}} Z (G) {〈 1 {(G, σ) is (ε, l) -judicious w . r . t . p} 〉}_{G} | G ≅_{ℓ} G] \leq ℬ_{ϑ} (p) + α n .$ (4.12)

Finally, the assertion follows from (4.12) and Bayes’ rule.

Corollary 4.9

Suppose that p is a marginal assignment. For any $α > 0$ there exists $ℓ > 0$ such that for all $l \geq ℓ$ ,

$\lim_{n \to \infty} P [\frac{1}{n} \ln E [Z (G) | T_{l}] \leq ℬ_{ϑ} (p) - α | A_{100 l}] = 0.$

Choose a small $ε = ε (α) > 0$ . By Lemma 4.3 there exists $ℓ$ such that $E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε$ for all $l \geq ℓ$ . Hence, fix some $l \geq ℓ$ and define

$q : T \in T_{l} \cap V \to P (Ω), T \mapsto p_{l, \partial^{l} T} .$

Moreover, for $T \in T_{l} \cap F$ let $q_{T} \in P (Ω^{d_{T}})$ be the (unique) distribution that maximises $H (q_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}}$ subject to the condition that $q_{T ↓ j} = q_{\partial^{l} T ↑ j}$ for all $j \in [d_{T}]$ (cf. (4.1)). Then q is a marginal sequence. Indeed, MS1, MS2 are trivially satisfied and MS3 holds because $q_{T ↓ j} = q_{\partial^{l} T ↑ j}$ for all $T \in T_{l} \cap F, j \in [d_{T}]$ . Further, if we pick $δ = δ (ε, l) > 0$ small enough, then Proposition 4.7 implies that for large n and any $G \in A_{100 l}$

$\begin{array}{l} n^{- 1} \ln E [Z_{l, q, δ} (G) | G ≅_{ℓ} G] \geq ℬ_{G, ℓ} (q) - α / 2 = - α / 2 + \sum_{T \in T_{ℓ} \cap V} (1 - d_{T}) H (q_{T}) λ_{G, l} (T | V) \\ + \frac{| F_{n} |}{| V_{n} |} \sum_{T \in T_{ℓ} \cap ℱ} [H (q_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}}] λ_{G, l} (T | ℱ) . \end{array}$ (4.13)

To complete the proof, we need to compare the r.h.s. of (4.13) with $ℬ_{ϑ} (p)$ . Thus, let us write

$β (G) = \sum_{T \in T_{ℓ} \cap V} (1 - d_{T}) H (q_{T}) λ_{G, l} (T | V),$

$β' (G) = \sum_{T \in T_{ℓ} \cap F} [H (q_{T}) + {〈 \ln ψ_{T} (σ) 〉}_{q_{T}}] λ_{G, l} (T | F) .$

Because ${‖ ϑ_{l} - λ_{G, l} ‖}_{TV} < ε$ w.h.p. by (3.9), $E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε$ by our choice of l, the entropy is a uniformly continuous on $P (Ω)$ and $d_{T} \leq Δ$ is uniformly bounded for all T, (3.11) ensures that we can make $ε$ so small that

$\lim_{n \to \infty} P [E [| β (G) - E [(1 - d_{T}) H (p_{T}) | V] | | T_{l}] > α / 4 | A_{100 l}] = 0.$ (4.14)

A similar argument applies to $β' (G)$ . Indeed, since $E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε$ and because all degrees are bounded by Δ, condition MA2 from Definition 4.1 implies that

$E [\sum_{j = 1}^{d_{T}} {‖ p_{l, \partial^{l} [T ↑ j]} - p_{T ↓ j} ‖}_{TV} | F] < ε^{1 / 4}$

provided $ε$ is small enough. In effect, because for any $ψ \in Ψ$ the function $μ \in P (Ω^{d_{ψ}}) \mapsto H (μ) + {〈 ψ (σ) 〉}_{μ}$ is uniformly continuous, MA3 and the construction of q _T for $T \in T_{l} \cap F$ ensure that

$E [| H (q_{l, \partial^{l + 1} T}) + {〈 ψ_{T} (σ) 〉}_{q_{l, \partial^{l + 1} T}} - H (p_{T}) - {〈 ψ_{T} (σ) 〉}_{p_{T}} | | F] < α / 8.$ (4.15)

Since ${‖ ϑ_{l} - λ_{G, l} ‖}_{TV} < ε$ w.h.p. by (3.9), (4.15) implies that

$\lim_{n \to \infty} P [E [| β' (G) - E [H (p_{T}) + {〈 ψ_{T} (σ) 〉}_{p_{T}} | F] | | T_{l}] > α / 4 | A_{100 l}] = 0.$ (4.16)

Finally, the assertion follows from (4.13), (4.14) and (4.16).

4.6. Proof of Theorem 4.4

We begin by spelling out the following consequence of the symmetry assumption. Let p be a marginal assignment.

Lemma 4.10

If

$\underline{M}$

is p‐symmetric, then for any

$ε > 0$

for all sufficiently large

$ℓ$

we have

$\begin{array}{l} \lim_{n \to \infty} P [\sum_{x \in V_{n}} {‖ μ_{G ↓ x} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} > ε n] \\ = \lim_{n \to \infty} P [μ_{G} f a i l s t o b e (ε, 2) - s y m m e t r i c] = 0 a n d \end{array}$ (4.17)

$\begin{array}{l} \lim_{n \to \infty} P [\sum_{x \in V_{n}} {‖ μ_{{\hat{G}}_{ℓ} ↓ x} - p_{ℓ, \partial^{ℓ} [{\hat{G}}_{ℓ}, x]} ‖}_{TV} > ε n] \\ = \lim_{n \to \infty} P [μ_{{\hat{G}}_{ℓ}} f a i l s t o b e (ε, 2) - s y m m e t r i c] = 0. \end{array}$ (4.18)

Choose $η = η (ε) > 0$ small enough. For an integer $ℓ > 0$ consider the event

$ℰ_{ℓ} = {\sum_{x, y \in V_{n}} {‖ μ_{G ↓ {x, y}} - p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]} ‖}_{TV} < η^{2} n^{2}}$

If $M$ is p‐symmetric, then $\lim_{n \to \infty} P [G \in ℰ_{ℓ}] = 1$ for sufficiently large $ℓ$ . Similarly, if the planted distribution is p‐symmetric, then $\lim_{n \to \infty} P [{\hat{G}}_{ℓ} \in ℰ_{ℓ}] = 1$ for large $ℓ$ .

Hence, assume that $G \in ℰ_{ℓ}$ . Then by the triangle inequality, for any $ω \in Ω$ ,

$\frac{1}{n} \sum_{x \in V_{n}} | p_{ℓ, \partial^{ℓ} [G, x]} (ω) - μ_{G ↓ x} (ω) |$

$= \frac{1}{n^{2}} \sum_{x \in V_{n}} | [\sum_{y \in V_{n}} \sum_{ω' \in Ω} p_{ℓ, \partial^{ℓ} [G, x]} (ω) p_{ℓ, \partial^{ℓ} [G, y]} (ω')] - [\sum_{y \in V_{n}} \sum_{ω' \in Ω} μ_{G ↓ x, y} (ω, ω')] | \leq η^{2} .$

Therefore,

$\frac{1}{n} \sum_{x \in V_{n}} {‖ p_{ℓ, \partial^{ℓ} [G, x]} - μ_{G ↓ x} ‖}_{TV} \leq η^{2} | Ω | < η .$ (4.19)

Furthermore, by (4.19) and the triangle inequality,

$\frac{1}{n^{2}} \sum_{x, y \in V_{n}} {‖ μ_{G ↓ x} \otimes μ_{G ↓ y} - p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]} ‖}_{TV} \leq 2 η .$ (4.20)

Since $G \in ℰ_{ℓ}$ , (4.20) entails that

$\frac{1}{n^{2}} \sum_{x, y \in V_{n}} {‖ μ_{G ↓ x} \otimes μ_{G ↓ y} - μ_{G ↓ {x, y}} ‖}_{TV} \leq 3 η < ε,$

i.e., G is $(ε, 2)$ ‐symmetric.

Together with Lemma 4.10 the following lemma shows that under the assumptions of Theorem 4.4 the partition function is dominated by its judicious part.

Lemma 4.11

There is a number

$ε_{0} = ε_{0} (Δ, Ω, Ψ, Θ)$

such that for all

$0 < ε < ε_{0}, ℓ > 0$

there exists

$χ > 0$

such that for large enough n the following is true. If

$G \in G (M_{n})$

is a

$(2 ℓ + 5)$

‐acyclic factor graph such that

$\sum_{x \in V_{n}} {‖ μ_{G ↓ x} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < ε^{3} n$ (4.21)

and μ_G is

$(χ, 2)$

‐symmetric, then

${〈 1 {(G, σ) is (ε, ℓ) - j u d i c i o u s w . r . t . p} 〉}_{G} \geq 1 / 2$

Pick $δ = δ (ℓ, ε) > 0$ small, $β = β (δ)$ and $γ = γ (β)$ smaller and $χ = χ (γ) > 0$ smaller still and assume that $n > n_{0} (χ)$ . Let $V_{0}$ be the partition of V _n such that $x, y \in V_{n}$ belong to the same class iff $\partial^{ℓ + 2} [G, x] = \partial^{ℓ + 2} [G, y]$ . By Theorem 2.1 there exists a refinement $V$ of $V_{0}$ such that μ _G is γ‐homogeneous with respect to $(V, S)$ for some partition $S$ of $Ω^{n}$ such that $# V + # S \leq N = N (γ)$ . We may index the classes of $V$ as $V_{T, i}$ with $T = \partial^{ℓ + 2} [G, x]$ for all x in the class and $i \in [N_{T}]$ for some integer N _T.

Let J be the set of all $j \in [# S]$ such that $μ (S_{j}) \geq δ^{6} / N$ and $μ [\cdot | S_{j}]$ is γ‐regular. Then

$\sum_{j \in J} μ (S_{j}) \geq 1 - δ^{6} .$ (4.22)

Choosing χ small enough, we obtain from Corollary 2.4 that

$\frac{1}{n} \sum_{x \in V_{n}} {‖ μ_{G ↓ x} [\cdot | S_{j}] - μ_{G ↓ x} ‖}_{TV} < δ^{7} for all j \in J .$

Therefore, by (4.21) and the triangle inequality, for $j \in J$ we get

$\begin{array}{l} \frac{1}{n} \sum_{T, i} | V_{T, i} | {〈 {‖ σ [\cdot | V_{T, i}] - p_{ℓ, T} ‖}_{TV} 〉}_{G} \\ = \frac{1}{n} \sum_{T, i} \sum_{j \in [# S]} | V_{T, i} | μ_{G} (S_{j}) {〈 {‖ σ [\cdot | V_{T, i}] - p_{ℓ, T} ‖}_{TV} | S_{j} 〉}_{G} \\ \leq δ^{7} + \frac{1}{n} \sum_{T, i} \sum_{j \in [# S]} | V_{T, i} | μ_{G} (S_{j}) {‖ {〈 σ [\cdot | V_{T, i}] | S_{j} 〉}_{G} - p_{ℓ, T} ‖}_{TV} [by H M 2] \\ \leq δ^{7} + \frac{1}{n} \sum_{T, i} \sum_{x \in V_{T, i}} \sum_{j \in [# S]} μ_{G} (S_{j}) {‖ μ_{G ↓ x} [\cdot | S_{j}] - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < 3 ε^{3} . \end{array}$

Consequently, by (4.22), Bayes' rule and the triangle inequality, summing on and we get

(4.23)

Applying the triangle inequality once more, we find

$\sum_{T \in T_{ℓ} \cap V} λ_{G, ℓ} [T | V] 〈 {‖ q_{G, σ, ℓ, T} - p_{ℓ, T} ‖}_{TV} 〉 \leq \frac{1}{n} \sum_{T, i} | V_{T, i} | {〈 {‖ σ [\cdot | V_{T, i}] - p_{ℓ, T} ‖}_{TV} 〉}_{G} < 3 ε^{3} .$ (4.24)

Further, consider $T \in T_{ℓ} \cap F$ such that $λ_{G, ℓ} [T | F] > 0$ and let $j \in [d_{T}]$ . Because G is $(2 ℓ + 5)$ ‐acyclic, there exists a set $Γ (T, j) \subset T_{ℓ + 2} \cap V$ with the following two properties. First, for every constraint node a with $\partial^{ℓ + 1} [G, a] = T$ the variable node $x = \partial (G, a, j)$ satisfies $\partial^{ℓ + 2} [G, x] \in Γ (T, j)$ . Second, for every variable node x with $\partial^{ℓ + 2} [G, x] \in Γ (T, j)$ there is a constraint node a with $\partial^{ℓ + 1} [G, a] = T$ such that $\partial (G, a, j) = x$ . For $R \in Γ (T, j)$ let $m_{R, T, i, j}$ be the number of constraint nodes a with $\partial^{ℓ + 1} [G, a] = T$ such that $x = \partial (G, a, j)$ belongs to $V_{R, i}$ . Then by the triangle inequality,

$\begin{array}{l} \sum_{T \in T_{ℓ} \cap ℱ} \sum_{j \in [d_{T}]} λ_{G, ℓ} [T | ℱ] {〈 {‖ q_{G, σ, ℓ, T ↓ j} - p_{ℓ, \partial^{ℓ} [T ↑ j]} ‖}_{TV} 〉}_{G} \\ \leq \sum_{T \in T_{ℓ} \cap ℱ} \sum_{j \in [d_{T}]} \sum_{R \in Γ (T, j)} \sum_{i \in [N_{R}]} \frac{m_{R, T, i, j}}{| F_{n} |} {〈 {‖ σ [\cdot | V_{R, i}] - p_{ℓ, R} ‖}_{TV} 〉}_{G} \\ \leq \frac{Δ^{2}}{n} \sum_{R \in T_{ℓ + 2} \cap V} \sum_{i \in [N_{R}]} | V_{R, i} | {〈 {‖ σ [\cdot | V_{R, i}] - p_{ℓ, R} ‖}_{TV} 〉}_{G}; \end{array}$ (4.25)

the last inequality follows because all degrees are between one and Δ. Finally, the assertion follows from (4.24) and (4.25).

We proceed by proving the upper bound and the lower bound statement from Theorem 4.4 separately. Strictly speaking, the proof of the lower bound implies the upper bound as well. But presenting the arguments separately makes them slightly easier to follow.

Proof of Theorem 4.4

upper bound. We assume that $\underline{M}$ is p‐symmetric. Pick and fix a number $α > 0$ ; we aim to show that for large enough n,

$\frac{1}{n} E [\ln Z (G)] \leq ℬ_{ϑ} (p) + 4 α .$ (4.26)

For $ε, l > 0$ let

$ℰ (ε, l) = {\sum_{x \in V_{n}} {‖ μ_{G ↓ x} - p_{l, \partial^{l} [G, x]} ‖}_{TV} < ε n} .$

Additionally, let $S (χ)$ be the event that $μ_{G}$ is $(χ, 2)$ ‐symmetric and let $ℒ (ε, l)$ be the event that ${‖ λ_{G, l} - ϑ_{l} ‖}_{TV} < ε$ . Corollary 4.8 shows that for some small enough $ε > 0$ and large enough $ℓ$ (both dependent on α) for all $l \geq ℓ$ for large enough n we have

$\frac{1}{n} \ln E [1 {G \in ℒ (ε^{4}, l) \cap A_{100 l}} Z (G) {〈 1 {(G, σ) is (ε, l) -judicious w . r . t . p} 〉}_{G}] \leq ℬ_{ϑ} (p) + α .$ (4.27)

To apply this bound we are going to argue that $Z (G) 〈 1 {(G, σ) is (ε, l) -judicious w . r . t . p} 〉$ is not much smaller than $Z (G)$ for most $G$ .

The proof of this fact is based on Lemma 4.11. To apply it, we need to pick and fix some specific, large enough $ℓ_{*} > ℓ$ (upon which the value of χ provided by Lemma 4.11 will depend). By Lemma 4.3 there is $ℓ_{*}'$ such that

$E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] < ε^{4} for all l \geq ℓ_{*}' .$ (4.28)

Further, by Lemma 4.10 there is $ℓ_{*}''$ such that

$\lim_{n \to \infty} P [G \in ℰ (ε^{3}, l)] = 1 for all l \geq ℓ_{*}'' .$

Let $ℓ_{*} = ℓ + ℓ_{*}' + ℓ_{*}''$ . Now, Lemma 4.11 yields $χ = χ (ε, ℓ_{*}) > 0$ such that the following is true. Consider the event $U = S (χ) \cap ℒ (ε^{4}, ℓ_{*}) \cap ℰ (ε^{3}, ℓ_{*})$ . Then

$1 {G \in U \cap A_{100 ℓ_{*}}} Z (G) \leq 21 {G \in U \cap A_{100 ℓ_{*}}} Z (G) {〈 1 {(G, σ) is (ε, ℓ_{*}) -judicious w . r . t . p} 〉}_{G} .$ (4.29)

Combining (4.27) and (4.29) we obtain

$E [1 {G \in U \cap A_{100 ℓ_{*}}} Z (G)] \leq 2 \exp (n (ℬ_{ϑ} (p) + α)) .$ (4.30)

Hence, we are left to estimate the probability of the event $U \cap A_{100 ℓ_{*}}$ . With respect to $U$ we obtain from Lemma 4.10 that $\lim_{n \to \infty} P [G \in S (χ)] = 1$ and $\lim_{n \to \infty} P [G \in ℰ (ε^{3}, ℓ_{*})] = 1$ . Moreover, the local convergence assumption (3.9) implies that $\lim_{n \to \infty} P [G \in ℒ (ε^{4}, ℓ_{*})] = 1$ . Consequently,

$\lim_{n \to \infty} P [G \in U] = 1.$ (4.31)

Hence, the high girth assumption (3.11) yields

$\underset{n \to \infty}{\lim \inf} P [G \in U \cap A_{100 ℓ_{*}}] > 0.$ (4.32)

Finally, combining (4.30) and (4.32) and using Markov's inequality, we obtain

$\lim_{n \to \infty} P [Z (G) > \exp (n (ℬ_{ϑ} (p) + 2 α)) | U \cap A_{100 ℓ_{*}}] = 0.$

Further, since (4.32) shows that the probability of the event $U \cap A_{100 ℓ_{*}}$ is bounded away from 0, Proposition 3.2 yields

$\lim_{n \to \infty} P [Z (G) > \exp (n (ℬ_{ϑ} (p) + 3 α))] = 0.$ (4.33)

Because $| n^{- 1} \ln Z (G) |$ is bounded by some number $C = C (Δ, Ω, Ψ, Θ) > 0$ by the definition (3.4) of Z, (4.26) follows from (4.33).

To establish the lower bound we introduce a construction reminiscent of those used in 24, 25, 31, 39, 48. Namely, starting from the sequence $\underline{M}$ of $(Δ, Ω, Ψ, Θ)$ ‐models, we define another sequence ${\underline{M}}^{\otimes} = {(M_{n}^{\otimes})}_{n}$ of models as follows. Let $Ω^{\otimes} = Ω \times Ω$ and let us denote pairs $(ω, ω') \in Ω^{\otimes}$ by $ω \otimes ω'$ . Further, for any $ψ : Ω^{h} \to (0, \infty)$ we define a function

ψ^{\otimes} : {(Ω^{\otimes})}^{h} \to (0, \infty), (ω_{1} \otimes ω_{1}', \dots, ω_{h} \otimes ω_{h}') \mapsto ψ (ω_{1}, \dots, ω_{h}) \cdot ψ (ω_{1}', \dots, ω_{h}') .

Let $Ψ^{\otimes} = {ψ^{\otimes} : ψ \in Ψ}$ . Then the $(Δ, Ω, Ψ, Θ)$ ‐model $M_{n} = (V_{n}, F_{n}, d_{n}, t_{n}, {(ψ_{a})}_{a \in F_{n}})$ gives rise to the $(Δ, Ω^{\otimes}, Ψ^{\otimes}, Θ)$ ‐model $M_{n}^{\otimes} = (V_{n}, F_{n}, d, t, {(ψ_{a}^{\otimes})}_{a \in F_{n}})$ .

Clearly, there is a canonical bijection $G (M) \to G (M^{\otimes}), G \mapsto G^{\otimes}$ . Moreover, the construction ensures that the Gibbs measure $μ_{G^{\otimes}} \in P (Ω^{\otimes n})$ equals $μ_{G} \otimes μ_{G}$ . Explicitly, for all $ω_{1}, ω_{1}', \dots, ω_{n}, ω_{n}' \in Ω$ ,

μ_{G^{\otimes}} (ω_{1} \otimes ω_{1}', \dots, ω_{n} \otimes ω_{n}') = μ_{G} (ω_{1}, \dots, ω_{n}) μ_{G} (ω_{1}', \dots, ω_{n}') .

(4.34)

In effect, we obtain

Z (G^{\otimes}) = Z {(G)}^{2} .

(4.35)

Further, writing $G^{\otimes}, T^{\otimes}$ for the $(Δ, Ω^{\otimes}, Ψ^{\otimes}, Θ)$ ‐templates and the acyclic $(Δ, Ω^{\otimes}, Ψ^{\otimes}, Θ)$ ‐templates, we can lift the marginal assignment p from $T$ to $T^{\otimes}$ by letting $p_{T^{\otimes}}^{\otimes} = p_{T} \otimes p_{T}$ for all T. Additionally, let $ϑ^{\otimes} \in P (T^{\otimes})$ be the image of $ϑ$ under the map $T \in T \mapsto T^{\otimes} \in T^{\otimes}$ so that

ℬ_{ϑ^{\otimes}} (p^{\otimes}) = 2 ℬ_{ϑ} (p) .

(4.36)

Proof of Theorem 4.4

lower bound. We assume that $\underline{M}$ is p‐symmetric and that the same is true of the planted distribution. For $ε, l > 0$ consider the event

$ℰ^{\otimes} (ε, l) = {\frac{1}{n} \sum_{x \in V_{n}} {‖ μ_{G^{\otimes} ↓ x} - p_{\partial^{l} [G^{\otimes}, x]}^{\otimes} ‖}_{TV} < ε} .$ (4.37)

and let $S^{\otimes} (χ)$ be the event that $μ_{G^{\otimes}}$ is $(χ, 2)$ ‐symmetric. Moreover, as before let $ℒ (ε, ℓ) = {{‖ λ_{G, ℓ} - ϑ_{ℓ} ‖}_{TV} < ε}$ . Basically, we are going to apply the same argument as in the proof of the upper bound to the random factor graph $G^{\otimes}$ and to ${\hat{G}}_{l}$ for a large enough l.

Hence, let $α > 0$ . Then Corollary 4.8 applied to ${\underline{M}}^{\otimes}$ yields a small $ε = ε (α) > 0$ and a large $ℓ = ℓ (α) > 0$ such that for all $l \geq ℓ$ and large enough n we have

$\frac{1}{n} \ln E [1 {G \in ℒ (ε^{4}, l) \cap A_{100 l}} Z (G^{\otimes}) {〈 1 {(G^{\otimes}, σ) is (ε, l) -judicious w . r . t . p} 〉}_{G^{\otimes}}]$

$\leq ℬ_{ϑ^{\otimes}} (p^{\otimes}) + α .$ (4.38)

Further, by Lemma 4.3 and (4.34) there exists $ℓ_{*}'$ such that

$E [{‖ p_{l, \partial^{l} T} - p_{T} ‖}_{TV} | V] + E [{‖ p_{l, \partial^{l} T^{\otimes}}^{\otimes} - p_{T^{\otimes}}^{\otimes} ‖}_{TV} | V] < ε^{4} for all l \geq ℓ_{*}' .$ (4.39)

Moreover, Lemma 4.10 shows that for some $ℓ_{*}''$ we have

$\lim_{n \to \infty} P [G \in ℰ^{\otimes} (ε^{3}, l)] = 1 for all l \geq ℓ_{*}'' .$ (4.40)

Similarly, (4.34), (4.39) the planted p‐symmetry assumption and Lemma 4.10 imply that there is $ℓ_{*}'''$ such that for any $l > ℓ_{*}'''$ for large enough $\hat{l}$ we have

$\lim_{n \to \infty} P [{\hat{G}}_{\hat{l}} \in ℰ^{\otimes} (ε^{3}, l)] = 1.$ (4.41)

Additionally, Corollary 4.9 shows that for a certain $ℓ_{*}''''$ we have

$\lim_{n \to \infty} P [\frac{1}{n} \ln E [Z (G) | T_{l}] \leq ℬ_{ϑ} (p) - α | A_{100 l}] = 0 for all l \geq ℓ_{*}'''' .$ (4.42)

Let $ℓ_{*} = ℓ_{*}' + ℓ_{*}'' + ℓ_{*}''' + ℓ_{*}''''$ .

Applying Lemma 4.11 to ${\underline{M}}^{\otimes}$ , we obtain $χ_{*} = χ_{*} (ε, ℓ_{*}) > 0$ such that the following is true: let $U^{\otimes} = S^{\otimes} (χ_{*}) \cap ℒ (ε^{4}, ℓ_{*}) \cap ℰ^{\otimes} (ε^{3}, ℓ_{*})$ and define $Z (G) = 1 {G \in U^{\otimes} \cap A_{100 ℓ_{*}}} Z (G)$ . Then (using (4.35))

$Z {(G)}^{2} = 1 {G \in U^{\otimes} \cap A_{100 ℓ_{*}}} Z (G^{\otimes})$

$\leq 21 {G \in U^{\otimes} \cap A_{100 ℓ_{*}}} Z (G^{\otimes}) {〈 1 {(G^{\otimes}, σ) is (ε, ℓ_{*}) -judicious w . r . t . p} 〉}_{G^{\otimes}} .$ (4.43)

Further, Proposition 2.5 and Lemma 4.10 imply that

$\lim_{n \to \infty} P [G \in S^{\otimes} (χ_{*})] = 1.$ (4.44)

Combining (3.9), (4.40) and (4.44), we get

$\lim_{n \to \infty} P [G \in U^{\otimes}] = 1.$ (4.45)

Now, (4.36), (4.38) and (4.43) give an upper bound on the second moment of $Z$ , namely

$\begin{array}{l} E [Z {(G)}^{2}] \leq 2 E [1 {G \in ℒ (ε^{4}, ℓ_{*}) \cap A_{100 ℓ_{*}}} \\ {〈 1 {(G^{\otimes}, σ) is (ε, ℓ_{*}) -judicious w . r . t . p^{\otimes}} 〉}_{G^{\otimes}} Z (G^{\otimes})] \\ \leq 2 \exp (n (2 ℬ_{ϑ} (p) + α)) . \end{array}$ (4.46)

As a next step, we are going to show that

$E [Z (G)] \geq \exp (n (ℬ_{ϑ} (p^{\otimes}) - 2 α)) .$ (4.47)

Indeed, by Proposition 2.5 and Lemma 4.10 we have

$\lim_{n \to \infty} P [{\hat{G}}_{l} \in S^{\otimes} (χ_{*})] = 1$ (4.48)

for large enough l. Further, the local convergence assumption (3.9) and the construction (3.10) of the planted distribution ensures that for large enough l,

$\lim_{n \to \infty} P [{\hat{G}}_{l} \in ℒ (ε^{4}, ℓ_{*})] = 1.$

Hence, (4.41) and (4.48), show that for l large enough

$\lim_{n \to \infty} P [{\hat{G}}_{l} \in U^{\otimes}] = 1.$ (4.49)

Thus, (4.42), (4.49) and Proposition 3.8 yield (4.47).

Finally, combining (4.46) and (4.47) and applying the Paley‐Zygmund inequality (1.3), we obtain for large n,

$P [Z (G) \geq \exp (n (ℬ_{ϑ} (p) - 4 α))] \geq P [Z (G) \geq \exp (n (ℬ_{ϑ} (p) - 4 α))]$

$\geq \frac{E {[Z (G)]}^{2}}{2 E [Z {(G)}^{2}]} \geq \exp (- 10 α n) .$

Because this holds for any $α > 0$ , the assertion follows from Proposition 3.2.

4.7. Proof of Theorem 4.5

The key step of the proof is to establish the following statement.

Lemma 4.12

For any

$ε > 0$

there exists

$δ > 0$

such that for any

$ℓ > 0$

there exists n₀ such that for all

$n > n_{0}$

the following is true. Assume that

$G \in G (M_{n})$

satisfies

$\frac{1}{n} \sum_{x \in V_{n}} {〈 {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{μ} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} 〉}_{μ} < δ^{9} .$ (4.50)

Then G is

$(ε, 2)$

‐symmetric and

$\sum_{x \in V_{n}} {‖ μ_{G ↓ x} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < ε n$

Before we prove Lemma 4.12 let us show how it implies Theorem 4.5.

Proof of Theorem 4.5

If $G \in G (M_{n})$ satisfies is $(ε, 2)$ ‐symmetric and ${\sum_{x \in V_{n}} ‖ μ_{G ↓ x} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < ε n$ , then by the triangle inequality

$\sum_{x, y \in V_{n}} {‖ μ_{G ↓ {x, y}} - p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]} ‖}_{TV} \leq \sum_{x, y \in V_{n}} {‖ μ_{G ↓ {x, y}} - μ_{G ↓ x} \otimes μ_{G ↓ y} ‖}_{TV}$

$+ {‖ μ_{G ↓ x} \otimes μ_{G ↓ y} - p_{ℓ, \partial^{ℓ} [G, x]} \otimes p_{ℓ, \partial^{ℓ} [G, y]} ‖}_{TV}$

$\leq 4 ε n^{2} .$

Therefore, the theorem follows by applying Lemma 4.12 either to the random factor graph $G$ or to the random factor graph ${\hat{G}}_{ℓ}$ chosen from the planted model.

Proof of Lemma 4.12

The proof is morally similar to one for the special case of the “stochastic block model” from 40. Let $γ = γ (ε) > 0$ be sufficiently small. By Theorem 2.1 we can pick $δ = δ (γ) > 0$ small enough so that there exists a partition $(\vec{V}, \vec{S})$ with $# \vec{V} + # \vec{S} < δ^{- 1}$ with respect to which μ _G is $γ^{4}$ ‐homogeneous. Suppose that V _i, S _j are classes such that $| V_{i} | \geq δ^{3 / 2} n, μ_{G} (S_{j}) \geq δ^{3 / 2}$ and such that $μ [\cdot | S_{j}]$ is $γ^{4}$ ‐regular on V _i. We claim that

$\frac{1}{| V_{i} |} \sum_{x \in V_{i}} {‖ μ_{G ↓ x} [\cdot | S_{j}] - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < 3 γ .$ (4.51)

The assertion is immediate from this inequality. Indeed, suppose that (4.51) is true for all i, j such that $| V_{i} | \geq δ^{3 / 2} n, μ_{G} (S_{j}) \geq δ^{3 / 2}$ such that $| V_{i} | \geq δ^{3 / 2} n, μ_{G} (S_{j}) \geq δ^{3 / 2}$ is $γ^{4}$ ‐regular on V _i. Then because $# V + # S \leq 1 / δ$

$\sum_{x} {‖ μ_{G ↓ x} [\cdot | S_{j}] - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < 4 γ n .$ (4.52)

Hence, by HM1 and Bayes’ rule, $\sum_{x} {‖ μ_{G ↓ x} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} < 5 γ n < ε n$ . Further, (4.52) and Lemma 2.8 imply that μ _G is $(ε, 2)$ ‐regular (provided that we pick γ small enough). Thus, we are left to prove (4.51).

Assume for contradiction that (4.51) is violated for V _i, S _j such that $| V_{i} | \geq δ^{3 / 2} n, μ_{G} (S_{j}) \geq δ^{3 / 2}$ . Then by the triangle inequality there is a set $W \subset V_{i}$ of size at least $γ | V_{i} |$ such that for all $x \in W$ we have

${‖ μ_{G ↓ x} [\cdot | S_{j}] - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} \geq γ .$

For $x \in W$ pick $ω_{x} \in Ω$ such that $| μ_{G ↓ x} [ω_{x} | S_{j}] - p_{ℓ, \partial^{ℓ} [G, x]} | \geq γ$ is maximum. Then by the pigeonhole principle there exist $ω \in Ω$ and $W' \subset W, | W' | \geq | W | / (2 | Ω |)$ , such that either

$\forall x \in W' : μ_{G ↓ x} [ω | S_{j}] \geq p_{ℓ, \partial^{ℓ} [G, x]} (ω) + γ or$ (4.53)

$\forall x \in W' : μ_{G ↓ x} [ω | S_{j}] \leq p_{ℓ, \partial^{ℓ} [G, x]} (ω) - γ$ (4.54)

In particular, for some ω we have

$\forall x \in W' : μ_{G ↓ x} [ω | S_{j}] \geq p_{ℓ, \partial^{ℓ} [G, x]} (ω) + γ / | Ω |$ (4.55)

We claim that there is a set $L \subset W'$ of size $| L | = ⌈ 1 / δ ⌉$ with the following properties.

(i)
the pairwise distance between any two $x, y \in L$ is at least $10 (ℓ + 1)$ .

(ii)
for all $x \in L$ we have
${〈 {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} 〉}_{μ_{G}} < δ^{4} .$ (4.56)

Indeed, because $| V_{i} | \geq δ^{2} n$ and $μ (S_{j}) \geq δ^{2}$ the assumption (4.50) implies that

$\sum_{x \in V_{i}} {〈 {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} 〉}_{μ_{G} [\cdot | S_{j}]} < δ^{5} | V_{i} | .$ (4.57)

Since $| W' | \geq γ | V_{i} | / | Ω | \geq δ | V_{i} |$ , (4.57) implies that there is a set $W'' \subset W'$ of size $| W'' | \geq | W' | / 2$ such that (4.56) holds for all $x \in W''$ . Now, construct a sequence $W'' = W_{0}'' \supset W_{1}'' \dots$ inductively as follows. In step $i \geq 1$ pick some $x_{i} \in W_{i - 1}''$ . Then $W_{i}''$ contains x _i and all $y \in W_{i - 1}'' ∖ {x_{i}}$ whose distance from x _i is greater than $10 (ℓ + 1)$ . Since for each x _i the total number of variable nodes at distance at most $10 (ℓ + 1)$ is bounded by $Δ^{10 (ℓ + 1)}$ and $| W_{0}'' | \geq δ | V_{i} | / 2 \geq δ^{3} n / 2$ , the set $\underset{i \geq 1}{\cap} W_{i}''$ has size at least $δ^{3} Δ^{- 10 (ℓ + 1)} n / 2 > 1 / δ$ , provided that n is large enough. Finally, simply pick any subset $L \subset \underset{i \geq 1}{\cap} W_{i}''$ of size $| L | = ⌈ 1 / δ ⌉$ .

Consider the event $ℰ = {σ [ω | L] \geq | L |^{- 1} \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} + γ^{3}} .$ We claim that

$μ_{G} [ℰ | S_{j}] \leq 2 δ^{2} .$ (4.58)

Indeed, by (4.56) and the union bound we have

$\begin{array}{l} {〈 1 {\forall x \in L : {‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} \leq δ} 〉}_{μ_{G}} \\ \geq 1 - \sum_{x \in L} {〈 1 {{‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} > δ} 〉}_{μ_{G}} \geq 1 - δ^{2} . \end{array}$ (4.59)

Now, let $L$ be the coarsest σ‐algebra such that $L \supset \nabla_{ℓ} (G, x)$ for all $x \in L$ . Suppose that $σ \in S_{j}$ is such that

${‖ {〈 σ [\cdot | x] | \nabla_{ℓ} (G, x) 〉}_{G} (σ) - p_{ℓ, \partial^{ℓ} [G, x]} ‖}_{TV} \leq δ for all x \in L .$ (4.60)

We claim that (4.60) implies

${〈 1 {σ \in ℰ} | L 〉}_{G} (σ) < δ^{3} .$ (4.61)

Indeed, let $X = \sum_{x \in L} 1 {σ (x) = ω}$ . Then (4.60) implies that

$〈 X (σ) | L 〉 (σ) \leq 2 δ | L | + \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} (ω) .$ (4.62)

Furthermore, the pairwise distance of the variables in L is at least $2 (ℓ + 1)$ and given $L$ the values of the variables at distance either $ℓ$ or $ℓ + 1$ from each $x \in L$ are fixed. Therefore, given $L$ the events ${σ (x) = ω}$ are mutually independent. In effect, X is stochastically dominated by a sum of independent random variables. Hence, recalling that γ is much smaller than δ, we see that (4.61) follows from (4.62) and the Chernoff bound. Finally, combining (4.59) and (4.61) we obtain (4.58).

But (4.58) does not sit well with (4.55). In fact, (4.55) entails that $μ_{G} [ℰ | S_{j}] \geq γ^{2}$ ; for consider the random variable $Y = \sum_{x \in L} 1 {σ (x) \neq ω}$ . Then (4.55) yields ${〈 Y 〉}_{μ [\cdot | S_{j}]} \leq \sum_{x \in L} (1 - μ_{G ↓ x} [ω | S_{j}]) \leq | L | (1 - γ / | Ω |) - \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} (ω)$ . Hence, by Markov's inequality

$1 - μ_{G} [ℰ | S_{j}] \leq \frac{{〈 Y 〉}_{μ [\cdot | S_{j}]}}{| L | (1 - γ^{3}) - \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} (ω)} \leq \frac{| L | (1 - γ / | Ω |) - \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} (ω)}{| L | (1 - γ^{3}) - \sum_{x \in L} p_{ℓ, \partial^{ℓ} [G, x]} (ω)}$

$\leq \frac{1 - γ / | Ω |}{1 - γ^{3}} \leq 1 - γ^{2} .$

Combining this bound with (4.58), we obtain $γ^{2} \leq μ_{G} (ℰ) / μ_{G} (S_{j}) \leq 2 δ^{2} / μ_{G} (S_{j})$ . Thus, choosing δ much smaller than γ, we conclude that $μ_{G} (S_{j}) < δ^{3 / 2}$ , which is a contradiction. Thus, we have established that (4.51).

5. CONDITIONING ON THE LOCAL STRUCTURE

5.1. A Generalised Configuration Model

The aim in this section is to prove Proposition 4.7. The obvious problem is the conditioning on the σ‐algebra $T_{ℓ}$ that fixes the depth‐ $ℓ$ neighborhoods of all variable nodes and the depth‐ $ℓ + 1$ neighborhoods of all constraint nodes. Following 16, we deal with this conditioning by setting up a generalised configuration model.

Recall that $T_{ℓ}$ is the (finite) set of all isomorphism classes $\partial^{ℓ} T$ for $T \in T \cap V$ and $\partial^{ℓ + 1} T$ for $T \in T \cap F$ . Let $ℓ, n > 0$ be integers and let $M = (V, F, d, t, {(ψ_{a})}_{a \in F})$ be a $(Δ, Ω, Ψ, Θ)$ ‐model of size n. Moreover, let $G \in G (M)$ be a $100 ℓ$ ‐acyclic factor graph. Then we define an enhanced $(Δ, Ω, Ψ, Θ_{ℓ})$ ‐model $M (G, ℓ)$ with type set $Θ_{ℓ} = (T_{ℓ} \cap V) \times [Δ]$ as follows. The set of variable nodes is V, the set of constraint nodes is F, the degrees are given by d and the weight function associated with each constraint a is ψ _a just as in $M$ . Moreover, the type of a variable clone (x, i) is $t_{G, ℓ} (x, i) = (\partial^{ℓ} [G, x], i)$ . Further, the type of a constraint clone (a, j) such that $\partial (G, a, j) = (x, i)$ is $t_{G, ℓ} (a, j) = (\partial^{ℓ} [G, x], i)$ . Clearly, $G (M (G, ℓ)) \subset G (M)$ . The following lemma shows that the model $M (G, ℓ)$ can be used to generate factor graphs whose local structure coincides with that of G.

Lemma 5.1

Assume that

$ℓ \geq 0$

and that

$G' \in G (M (G, ℓ))$

is

$2 ℓ + 4$

‐acyclic. Then

$G'$

viewed as a

$M$

‐factor graph satisfies

$G ≅_{ℓ} G'$

We are going to show inductively for $l \in [ℓ]$ that $G ≅_{l} G'$ . The case l = 0 is immediate from the construction. Thus, assume that l > 0, let $(x, i) \in C_{V}$ and let B be the set of all clones that have distance precisely l – 1 from (x, i). Since $G'$ is $(2 ℓ + 2)$ ‐acyclic, the pairwise distance of any two clones in B is at least 2. Moreover, by induction we know that $t_{G, 1} (w, j) = t_{G', 1} (w, j)$ for all $(w, j) \in B$ . Therefore, $t_{G, l} (x, i) = t_{G', l} (x, i)$ .

In order to prove Proposition 4.7 we need to enhance the model $M (G, ℓ)$ further to accommodate an assignment that provides a value from Ω for each clone. Thus, let $\hat{σ} : C_{V} \cup C_{F} \to Ω$ be a map. We call $\hat{σ}$ valid if $\hat{σ} (x, i) = \hat{σ} (x, j)$ for all $x \in V, i, j \in [d (x)]$ and if for all $θ \in Θ_{ℓ}$ we have

\forall ω \in Ω : | {(x, i) \in C_{V} : \hat{σ} (x, i) = ω, t_{G, ℓ} (x, i) = θ} |

= | {(a, j) \in C_{F} : \hat{σ} (a, j) = ω, t_{G, ℓ} (a, j) = θ} | .

Of course, we can extend a valid $\hat{σ}$ to a map $V \to Ω, x \mapsto \hat{σ} (x, 1)$ . Given a valid $\hat{σ}$ we define a $(Δ, Ω, Ψ, Θ_{ℓ} \times Ω)$ ‐model $M (G, \hat{σ}, ℓ)$ with variable nodes V, constraint nodes F, degrees d and weight functions ${(ψ_{a})}_{a \in F}$ such that the type $t_{G, \hat{σ}, ℓ} (x, i)$ of a variable clone (x, i) is $(\partial^{ℓ} [G, x], i, \hat{σ} (x, i))$ and such that the type $t_{G, \hat{σ}, ℓ} (a, j)$ of a constraint clone (a, j) with $\partial (G, a, j) = (x, i)$ is $(\partial^{ℓ} [G, x], i, \hat{σ} (a, j))$ . By construction, $G (M (G, \hat{σ}, ℓ)) \subset G (M (G, ℓ)) \subset G (M)$ . Let us recall the definition of the distance from (3.5). Further, for two maps $\hat{σ}, \hat{σ}' : C_{V} \cup C_{F} \to Ω$ let $dist (\hat{σ}, \hat{σ}') = | {(v, i) \in C_{V} \cup C_{F} : \hat{σ} (v, i) \neq \hat{σ}' (v, i))} |$ . In Section 5.2 we are going to establish the following.

Lemma 5.2

For any $ε, ℓ > 0$ there is $n_{0} = n_{0} (ε, ℓ, Δ, Ω, Ψ, Θ)$ such that for $n > n_{0}$ the following holds. If $M$ is a $(Δ, Ω, Ψ, Θ)$ ‐model of size n, $G \in G (M)$ is $100 ℓ$ ‐acyclic and $\hat{σ}$ is valid, then with probability at least $1 - ε$ the random factor graph $G (M (G, \hat{σ}, ℓ))$ has the following property. There exist a valid $\hat{σ}'$ and a $4 ℓ$ ‐acyclic $G' \in G (M (G, \hat{σ}', ℓ))$ such that $d i s t (\hat{σ}, \hat{σ}') + dist (G', G (M (G, \hat{σ}, ℓ))) \leq n^{0.9} .$

To proceed consider a $(G, ℓ)$ ‐marginal sequence q. We call $\hat{σ}$ q‐valid if the following two conditions hold.

1
V1: For all $T \in T_{ℓ} \cap V, ω \in Ω$ we have
$| {x \in V : \partial^{ℓ} [G, x] = T, \hat{σ} (x) = ω} | = q_{T} (ω) | {x \in V : \partial^{ℓ} [G, x] = T} | .$
2
V2: For all $T \in T_{ℓ} \cap F, ω_{1}, \dots, ω_{d_{T}} \in Ω$ we have
$| {a \in F : \partial^{ℓ + 1} [G, a] = T, \forall j \in [d_{T}] : \hat{σ} (a, j) = ω_{j}} |$

$= q_{T} (ω_{1}, \dots, ω_{d_{T}}) | {a \in F : \partial^{ℓ + 1} [G, a] = T} | .$

Lemma 5.3

For any $ε, ℓ > 0$ there is $n_{0} = n_{0} (ε, ℓ, Δ, Ω, Ψ, Θ)$ such that for $n > n_{0}$ the following holds. Assume that $M$ is a $(Δ, Ω, Ψ, Θ)$ ‐model of size n, $G \in G (M)$ is $100 ℓ$ ‐acyclic and q is a $(G, ℓ)$ ‐marginal sequence such that there exists a q‐valid $\hat{σ}$ . Then with the sum ranging over all q‐valid $\hat{σ}$ we have

$\exp (n ℬ_{G} (ℓ, q) - \sqrt{n}) \leq \sum_{\hat{σ}} \frac{| G (M (G, \hat{σ}, ℓ)) |}{| G (M (G, ℓ)) |} \leq \exp (n ℬ_{G} (ℓ, q) + \sqrt{n}) .$

We defer the proof of Lemma 5.3 to Section 5.3.

Proof of Proposition 4.7

We claim that

$| {G' \in G (M (n)) : G' ≅_{ℓ} G} | \geq | G (M (G, ℓ)) | \exp (- n^{0.91}) .$ (5.1)

To see this, apply Lemma 5.2 to the constant map $\hat{σ} : (v, j) \in C_{V} \cup C_{F} \mapsto ω_{0}$ for some fixed $ω_{0} \in Ω$ . Then we conclude that with probability at least 1/2 the random graph $G (M (G, ℓ)) = G (M (G, \hat{σ}, ℓ))$ is at distance at most $n^{0.9}$ from a $4 ℓ$ ‐acyclic $G' \in G (M (G, ℓ)) \subset G (M)$ . Furthermore, by Lemma 5.1 this factor graph $G'$ , viewed as an element of $G (M)$ , satisfies $G ≅_{ℓ} G'$ . Finally, since the total number of factor graphs at distance at most $n^{0.9}$ from $G'$ is bounded by $\exp (n^{0.91})$ because all degrees are bounded, we obtain (5.1).

Let $δ > 0$ be small enough. If $σ \in Σ (G, ℓ, q, δ)$ , then by (4.7) there exists a $(G, ℓ)$ ‐marginal sequence $q'$ such that $σ \in Σ (G, ℓ, q', 0)$ such that ${‖ q_{T} - q_{T}' ‖}_{TV} < δ$ for all $T \in T_{ℓ}$ . Because $T_{ℓ}$ is finite and $Σ (G, ℓ, q', 0) \neq \emptyset$ , the total number of such $q'$ is bounded by a polynomial in n. Moreover, due to the continuity of $ℬ_{G, ℓ} (\cdot)$ we can choose $δ = δ (ℓ)$ small enough so that $| ℬ_{G, ℓ} (q') - ℬ_{G, ℓ} (q) | < ε / 2$ for all such $q'$ . Hence, summing over all $\hat{σ}$ corresponding to $σ \in Σ (G, ℓ, q, δ)$ , we obtain from (5.1) and Lemma 5.3 that

$E [Z_{ℓ, q} (G) | G ≅_{ℓ} G] \leq \sum_{\hat{σ}} \frac{| G (M (G, \hat{σ}, ℓ)) |}{| {G' \in G (M (n)) : G' ≅_{ℓ} G} |} \leq \exp (n ℬ_{G} (ℓ, q) + ε n) .$

Conversely, by Lemma 5.2 with probability at least 1/2 the graph $G (M (G, \hat{σ}, ℓ))$ is within distance at most $n^{0.9}$ of a $4 ℓ$ ‐acyclic $G'$ , which satisfies $G' ≅_{ℓ} G$ by Lemma 5.1. As before, the total number of graphs at distance at most $n^{0.9}$ off $G'$ is bounded by $\exp (n^{0.91})$ . Similarly, the total number of $\hat{σ}'$ at distance at most $n^{0.9}$ off $\hat{σ}$ is bounded by $\exp (n^{0.91})$ . Therefore, by Lemma 5.1

$E [1 {A_{2 ℓ + 1}} Z_{ℓ, q} (G) | G ≅_{ℓ} G] \geq \frac{\exp (- 2 n^{0.98})}{2} \sum_{\hat{σ}} \frac{| G (M (G, \hat{σ}, ℓ)) |}{| G (M (G, ℓ)) |} \geq \exp (n ℬ_{G} (ℓ, q) - ε n),$

as desired.

5.2. Proof of Lemma 5.2

Let $Θ_{*} = {t_{G, \hat{σ}, ℓ} (x, i) : (x, i) \in C_{V}}$ be the set of all possible types. For each $τ \in Θ_{*}$ let $n_{τ}$ be the number of clones $(x, i) \in C_{V}$ with $t_{G, \hat{σ}, ℓ} (x, i) = τ$ . Throughout this section we assume that $n > n_{0} (ε, ℓ, Δ, Ω, Ψ, Θ)$ is sufficiently large.

Lemma 5.4

There exists

$β > 0$

such that the following is true. For any

$G, \hat{σ}$

there exists

$3 / 4 < γ < 7 / 8$

such that for every

$τ \in Θ_{*}$

either

$n_{τ} \leq n^{γ}$

or

$n_{τ} > n^{γ + β}$

The number of possible types is bounded independently of n. Hence, choosing β small enough, we can ensure that there exists an integer j > 0 such that $3 / 4 + j β < 7 / 8$ such that $[n^{3 / 4 + j β}, n^{3 / 4 + (j + 1) β}] \cap {n_{τ} : τ \in T} = \emptyset$ .

Fix $β, γ$ as in the previous lemma. Call τ rare if $n_{τ} \leq n^{γ}$ and common otherwise. Let Y be the number of variable clones that belong to cycles of length at most $10 ℓ$ in $G (M (G, \hat{σ}, ℓ))$ .

Lemma 5.5

For large enough n we have

$E [Y] \leq n^{γ} \ln n$

Let R be the set of variable clones (v, i) of a rare type and let U be the set of all variable clones whose distance from R in G does not exceed $100 ℓ$ . Since the maximum degree as well as the total number of types are bounded, we have $| U | \leq | R | \ln \ln n \leq n^{γ} \sqrt{\ln n}$ , provided that n is big enough. Thus, to get the desired bound on $E [Y]$ we merely need to consider the set W of common clones that are at distance more than $100 ℓ$ from R.

More specifically, let (v, i) be a common clone. We are going to bound the probability that $(v, i) \in W$ and that (v, i) lies on a cycle of length at most $10 ℓ$ . To this end, we are going to explore the (random) factor graph from (v, i) via the principle of deferred decisions. Let $i_{1} = i, \dots, i_{l} \in [Δ]$ be a sequence of $l \leq 10 ℓ$ indices. If (v, i) lies on a cycle of length at most $10 ℓ$ , then there exists such a sequence $(i_{1}, \dots, i_{l})$ that corresponds to this cycle. Namely, with $v_{1} = v$ the cycle comprises of the clones $(v_{1}, i_{1}), \dots, (v_{l}, i_{l})$ such that $\partial (G (M (G, \hat{σ}, ℓ)), v_{j}, i_{j}) = (v_{j + 1}, i_{j + 1})$ . In particular, $v_{l} = v_{1}$ . Clearly, the total number of sequences $(i_{1}, \dots, i_{l})$ is bounded. Furthermore, given that (v _l, i _l) is common, the probability that $v_{l} = v_{0}$ is bounded by $2 n^{- γ}$ . Since $γ > 3 / 4$ , the linearity of expectation implies that $E [Y] \leq | U | + 2 n^{1 - γ} \ln n \leq n^{γ} \ln n$ .

Lemma 5.6

Assume that $G'' \in G (M (G, \hat{σ}, ℓ))$ satisfies $Y (G'') \leq n^{γ} \ln^{2} n$ . Then there is a $4 ℓ$ ‐acyclic $G' \in G (M (G, ℓ))$ such that $dist (G', G'') \leq n^{0.9}$

Let R be the set of variable clones (v, i) of a rare type and let U be the set of all variable clones whose distance from R in G does not exceed $10 ℓ$ . Moreover, let $G''' \in G (M (G, ℓ))$ minimise $dist (G'', G''')$ subject to the condition that $\partial (G''', v, i) = \partial (G, v, i)$ for all $(v, i) \in U$ . Then $dist (G'', G''') \leq n^{γ} \ln n$ because the total number of types is bounded. Therefore, the assumption $Y (G'') \leq n^{γ} \ln^{2} n$ implies that $Y (G''') \leq n^{γ} \ln^{3} n$ , say. In addition, because G is $100 ℓ$ ‐acyclic, none of the clones in R lies on a cycle of length at most $4 ℓ$ in $G'''$ .

Altering only a bounded number of edges in each step, we are now going to remove the short cycles of $G'''$ one by one. Let C be the set of common clones. The construction of $G'''$ ensures that only common clones lie on cycles of length at most $4 ℓ$ . Consider one such clone (v, i) and let N be the set of all variable clones that can be reached from (v, i) by traversing precisely two edges of $G'''$ ; thus, N contains all clones (w, j) such that w has distance two from v and all clones (v, j) that are incident to the same constraint node as (v, i). Once more by the construction of $G'''$ we have $N \subset C$ . Furthermore, $| N | \leq Δ^{2}$ .

We claim that there exists $N' \subset C$ and a bijection $ξ : N \to N'$ such that the following conditions are satisfied.

(i)
$t_{G, \hat{σ}, ℓ} (w, j) = t_{G, \hat{σ}, ℓ} (ξ (w, j))$ for all $(w, j) \in N$

(ii)
the pairwise distance in $G'''$ between any two clones in $N'$ is at least $100 ℓ$ .

(iii)
the distance in $G'''$ between $N \cup {(v, i)}$ and $N'$ is at least $100 ℓ$ .

(iv)
the distance between R and $N'$ is at least $100 ℓ$ .

(v)
any $(w, j) \in N'$ is at distance at least $100 ℓ$ from any clone that belongs to a cycle of $G'''$ of length at most $4 ℓ$ .

Since the maximum degree of $G'''$ is bounded by Δ, there are no more than $n^{γ} \ln^{4} n$ clones violate condition (iii), (iv) or (v). By comparison, there are at least $n^{γ + β}$ clones of any common type. Hence, the existence of ξ follows.

Now, obtain $G''''$ from $G'''$ as follows.

let $G'''' (ξ (w, j)) = G''' (w, j)$ and $G'''' (w, j) = G''' (ξ (w, j))$ for all $(w, j) \in N$ .

let $G'''' (w, j) = G''' (w, j)$ for all $(w, j) \in N \cup N'$ .

It is immediate from the construction that any clone on a cycle of length at most $4 ℓ$ in $G''''$ also lies on such a cycle of $G'''$ . Moreover, (v, i) does not lie on a cycle of length at most $4 ℓ$ in $G''''$ . Hence, $Y (G'''') < Y (G''')$ . In addition, all clones on cycles of length at most $4 ℓ$ and their neighbours are common. Hence, the construction can be repeated on $G''''$ . Since $Y (G''') \leq n^{γ} \ln^{3} n$ , we ultimately obtain a $4 ℓ$ ‐acyclic $G''$ with $dist (G', G'') \leq n^{γ} \ln^{4} n < n^{0.9}$ .

Proof of Lemma 5.2

The assertion is immediate from Lemmas 5.5 and 5.6 and Markov's inequality.

5.3. Proof of Lemma 5.3

Let $V_{ℓ} = T_{ℓ} \cap V$ and for $T \in V_{ℓ}$ let n _T be the number of variable nodes x such that $\partial^{ℓ} [G, x] = T$ . By Stirling's formula the number $| Σ (G, ℓ, q, 0) |$ of assignments $σ : V_{n} \to Ω$ with marginals as prescribed by q satisfies

| \ln | Σ | - \sum_{T \in V_{ℓ}} n_{T} H (q_{T}) | \leq \ln^{2} n .

(5.2)

Further, for $T \in V_{ℓ}$ and $i \in [d_{T}]$ let $C_{V} (T, i)$ be the set of all clones $(x, i) \in C_{V}$ such that $t_{G, ℓ} (x, i) = (T, i)$ . Moreover, let $C_{F} (T, i)$ be the set of all clones $(a, j) \in C_{F}$ such that $t_{G, ℓ} (a, j) = (T, i)$ . Additionally, let $F_{ℓ} (T, i)$ be the set of all pairs $(T', j)$ with $T' \in T_{ℓ} \cap F, j \in [d_{T'}]$ such that there is $(a, j) \in C_{F} (T, i)$ such that $\partial^{ℓ + 1} [G, a] = T'$ . Of course, the total number of perfect matchings between $C_{V} (T, i)$ and $C_{F} (T, i)$ equals $n_{T}!$ . If we fix $σ \in Σ (G, ℓ, q, 0)$ , then any such perfect matching induces an assignment $\hat{σ} : C_{F} (T, i) \to Ω$ by mapping a clone $(a, j) \in C_{F} (T, i)$ matched to (x, i) to the value $σ (x)$ . Let $B_{T, i}$ be the event that in a such random matching for all $(T', j) \in F_{ℓ} (T, i)$ and all ω we have

| {(a, j) \in C_{F} : \partial^{ℓ + 1} [G, a] = T', \hat{σ} (a, j) = ω} | = q_{T' ↓ j} (ω) | {(a, j) \in C_{F} : \partial^{ℓ + 1} [G, a] = T'} |

Moreover, for $(T', j) \in F_{ℓ} (T, i)$ let $m_{T'}$ be the number of $a \in F$ such that $\partial^{ℓ + 1} [G, a] = T'$ . Then

P [B_{T, i}] = \frac{1}{n_{T}!} [\prod_{ω \in Ω} (\begin{matrix} q_{T} (ω) n_{T} \\ {(q_{T' ↓ j} (ω) m_{T'})}_{(T', j) \in F_{ℓ} (T, i)} \end{matrix})]

\times [\prod_{(T', j) \in F_{ℓ} (T, i)} (\begin{matrix} m_{T'} \\ {(q_{T' ↓ j} (ω) m_{T'})}_{ω \in Ω} \end{matrix})] \prod_{(T', j) \in F_{ℓ} (T, i), ω \in Ω} (q_{T' ↓ j} (ω) m_{T'})!

= {(\begin{matrix} n_{T} \\ {(q_{T} (ω) n_{T})}_{ω \in Ω} \end{matrix})}^{- 1} \prod_{(T', j) \in F_{ℓ} (T, i)} (\begin{matrix} m_{T'} \\ {(q_{T' ↓ j} (ω) m_{T'})}_{ω \in Ω} \end{matrix})

= \exp [O (\ln n) - \sum_{(T', j) \in F_{ℓ} (T, i)} m_{T'} D (q_{T' ↓ j} | | q_{T})] .

Let $F_{ℓ} = T_{ℓ} \cap F$ . Multiplying up over all (T, i), we obtain for $B = \cap B_{T, i}$

P [B] = \prod_{T \in V_{ℓ}} \prod_{i \in [d_{T}]} P [B_{T, i}] = \exp [O (\ln n) - \sum_{T' \in F_{ℓ}} \sum_{j \in [d_{T'}]} m_{T'} D (q_{T' ↓ j} | | q_{\partial^{ℓ} [T' ↑ j]})],

(5.3)

where the constant hidden in the $O (\cdot)$ depends on $Δ, Ω, Ψ, Θ, ℓ$ only.

Further, for $T' \in F_{ℓ}$ let $S_{T'}$ be the event that for every $(ω_{1}, \dots, ω_{d_{T'}}) \in Ω^{d_{T'}}$ we have

| {a \in F : \partial^{ℓ + 1} [G, a] = T', \forall j \in [d_{T'}] : \hat{σ} (a, j) = ω_{j}} |

= q_{T'} (ω_{1}, \dots, ω_{d_{T'}}) | {a \in F : \partial^{ℓ + 1} [G, a] = T'} | .

Then

\begin{array}{l} P [S_{T'} | B] = (\begin{matrix} m_{T'} \\ m_{T'} q_{T'} \end{matrix}) \prod_{j \in [d_{T'}]} {(\begin{matrix} m_{T'} \\ m_{T'} q_{T' ↓ j} \end{matrix})}^{- 1} \\ = \exp [O (\ln n) - m_{T'} D (q_{T'} | | q_{T' ↓ 1} \otimes \dots \otimes q_{T' ↓ d_{T'}})] . \end{array}

(5.4)

Moreover,

\begin{array}{l} - D (q_{T'} | | q_{T' ↓ 1} \otimes \dots \otimes q_{T' ↓ d_{T'}}) = H (q_{T'}) - \sum_{j \in [d_{T'}]} H (q_{T' ↓ j}) \\ = H (q_{T'}) - \sum_{j \in [d_{T'}]} H (q_{\partial^{ℓ} [T' ↑ j]}) \\ + \sum_{j \in [d_{T'}]} (H (q_{\partial^{ℓ} [T' ↑ j]}) - H (q_{T' ↓ j})) \end{array}

(5.5)

In addition,

\begin{array}{l} H (q_{\partial^{ℓ} [T' ↑ j]}) - H (q_{T' ↓ j}) \\ = \sum_{ω \in Ω} - q_{\partial^{ℓ} [T' ↑ j]} (ω) \ln q_{\partial^{ℓ} [T' ↑ j]} (ω) + q_{T' ↓ j} (ω) \ln q_{T' ↓ j} (ω) \\ = \sum_{ω \in Ω} (q_{T' ↓ j} (ω) - q_{\partial^{ℓ} [T' ↑ j]} (ω)) \ln q_{\partial^{ℓ} [T' ↑ j]} (ω) + q_{T' ↓ j} (ω) \ln \frac{q_{T' ↓ j} (ω)}{q_{\partial^{ℓ} [T' ↑ j]} (ω)} \\ = D (q_{T' ↓ j} | | q_{\partial^{ℓ} [T' ↑ j]}) + \sum_{ω \in Ω} (q_{T' ↓ j} (ω) - q_{\partial^{ℓ} [T' ↑ j]} (ω)) \ln q_{\partial^{ℓ} [T' ↑ j]} (ω) . \end{array}

(5.6)

Further, because q is a $(G, ℓ)$ ‐marginal sequence, condition MS3 guarantees that

\sum_{T' \in F_{ℓ}} m_{T'} \sum_{j \in [d_{T'}]} \sum_{ω \in Ω} (q_{T' ↓ j} (ω) - q_{\partial^{ℓ} [T' ↑ j]} (ω)) \ln q_{\partial^{ℓ} [T' ↑ j]} (ω) = 0.

(5.7)

Hence, letting $S = \cap S_{T'}$ , we obtain from ((5.4), (5.5))

P [S | B] = \exp ​ [O (\ln n) + \sum_{T' \in F_{ℓ}} m_{T'} [H (q_{T'}) + ​ \sum_{j \in [d_{T'}]} ​ [D (q_{T' ↓ j} | | q_{\partial^{ℓ} [T' ↑ j]}) - H (q_{\partial^{ℓ} [T' ↑ j]})]]] .

(5.8)

Once more the constant hidden in the $O (\cdot)$ depends on $Δ, Ω, Ψ, Θ, ℓ$ only. Further, given $S \cap B$ we have

\prod_{a \in F} ψ_{a} (σ) = \exp [\sum_{T' \in F_{ℓ}} m_{T'} {〈 \ln ψ_{T'} (σ) 〉}_{q_{T'}}] .

(5.9)

Finally, the assertion follows from (5.2), (5.3), (5.8) and (5.9).

ACKNOWLEDGEMENTS

The second author thanks Dimitris Achlioptas for inspiring discussions. We also thank two anonymous reviewers for their careful reading and their invaluable comments, which led to an improved version of Corollary 2.4, among other things.

^†

Supported by European Research Council European Union's Seventh Framework Programme (FP7/2007‐2013)/ERC Grant Agreement n. 278857–PTCC).

A preliminary version [10] of this paper, presented by the first author at RANDOM 2015 and by the second author at the RS&A 2015 conference, contained a critical technical error that affected its main results. This present version is based on similar key insights but the main results are different from the ones stated in [10].

Contributor Information

Victor Bapst, Email: bapst@math.uni-frankfurt.de.

Amin Coja‐Oghlan, Email: acoghlan@math.uni-frankfurt.de.

REFERENCES

1. Achlioptas D. and Coja‐Oghlan A., Algorithmic barriers from phase transitions, In Proceedings of 49th FOCS, IEEE, Philadelphia, 2008, pp. 793–802.
2. Achlioptas D. and Moore C., Random k‐SAT: Two moments suffice to cross a sharp threshold, SIAM J Comput 36 (2006), 740–762. [Google Scholar]
3. Achlioptas D. and Naor A., The two possible values of the chromatic number of a random graph, Ann Math 162 (2005), 1333–1349. [Google Scholar]
4. Achlioptas D., Naor A., and Peres Y., Rigorous location of phase transitions in hard optimization problems, Nature 435 (2005), 759–764. [DOI] [PubMed] [Google Scholar]
5. Achlioptas D., Naor A., and Peres Y., On the maximum satisfiability of random formulas, J ACM 54 (2007). [Google Scholar]
6. Achlioptas D., and Peres Y., The threshold for random k‐SAT is $2^{k} \ln 2 - O (k)$ , J AMS 17 (2004), 947–973. [Google Scholar]
7. Aldous D., Representations for partially exchangeable arrays of random variables, J Multivariate Anal 11 (1981), 581–598. [Google Scholar]
8. Aldous D. and Steele J., The objective method: probabilistic combinatorial optimization and local weak convergence, In Kesten H. editors, Probability on discrete structures, Encyclopaedia of Mathematical Sciences, Vol. 110, Springer, Berlin, 2004, pp. 1–72. [Google Scholar]
9. Bandyopadhyay A. and Gamarnik D., Counting without sampling: Asymptotics of the log‐partition function for certain statistical physics models, Random Struct Algorithms 33 (2008), 452–479. [Google Scholar]
10. Bapst V. and Coja‐Oghlan A., Harnessing the Bethe free energy, In Proceedings of 19th RANDOM, Leibniz International Proceedings in Informatics, Princeton, 2015, pp. 467–480, Also available as arXiv:1504.03975, version 1.
11. Bapst V. and Coja‐Oghlan A., The condensation phase transition in the regular k‐SAT model, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, Paris, 2016, pp. 22:1–22:18.
12. Bapst V., Coja‐Oghlan A., and Hetterich S., Rassmann F., and D. Vilenchik The condensation phase transition in random graph coloroing, Comm Math Phys 341 (2016), 543–606. [Google Scholar]
13. Bapst V., Coja‐Oghlan A., and Rassmann F. A positive temperature phase transition in random hypergraph 2‐coloring, Ann Appl Probab 26 (2016), 1362–1406. [Google Scholar]
14. Barak B., Rao A., Shaltiel R., and Wigderson A., 2‐source dispersers for sub‐polynomial entropy and Ramsey graphs beating the Frankl‐Wilson construction, In Proceedings of 38th STOC, ACM, Seattle, 2006, pp. 671–680.
15. Bayati M., Gamarnik D. and Tetali P., Combinatorial approach to the interpolation method and scaling limits in sparse random graphs, Ann Probab 41 (2013), 4080–4115. [Google Scholar]
16. Bordenave C. and Caputo P., Large deviations of empirical neighborhood distribution in sparse random graphs, Probab Theory Relat Fields 163 (2015), 149–222. [Google Scholar]
17. Coja‐Oghlan A. and Panagiotou K., The asymptotic k‐SAT threshold, Adv Math 288 (2016), 985–1068. [Google Scholar]
18. Coja‐Oghlan A., Perkins W., and Skubch K., Limits of discrete distributions and Gibbs measures on random graphs, preprint, arXiv:1512.06798, 2015.
19. Coja‐Oghlan A. and Perkins W., Belief Propagation on replica symmetric random factor graph models, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, 2016, pp. 27:1–27:15.
20. Coja‐Oghlan A. and Zdeborová L., The condensation transition in random hypergraph 2‐coloring, In Proceedings of 23rd SODA, ACM‐SIAM, Kyoto, 2012, pp. 241–250.
21. Contucci P., Dommers S., Giardina C., and Starr S., Antiferromagnetic Potts model on the Erdös‐Rényi random graph, Commun Math Phys 323 (2013), 517–554. [Google Scholar]
22. Dembo A. and Montanari A., Ising models on locally tree‐like graphs, Ann Appl Probab 20 (2010), 565–592. [Google Scholar]
23. Dembo A., Montanari A., and Sun N., Factor models on locally tree‐like graphs, Ann Probab 41 (2013), 4162–4213. [Google Scholar]
24. Dembo A., Montanari A., Sly A. and Sun N., The replica symmetric solution for Potts models on d‐regular graphs, Comm Math Phys 327 (2014), 551–575. [Google Scholar]
25. Ding J., Sly A., and Sun N., Satisfiability threshold for random regular NAE‐SAT, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 814–822.
26. Ding J., Sly A., and Sun N., Proof of the satisfiability conjecture for large k , In Proceedings of 47th STOC, ACM, Portland, 2015, pp. 59–68.
27. Durrett R., Probability: Theory and examples, 4th edition, Cambridge University Press, Cambridge, 2010. [Google Scholar]
28. Erdös P., Some remarks on the theory of graphs, Bull Am Math Soc 53 (1947), 292–294. [Google Scholar]
29. Erdös P. Graph theory and probability, Canad J Math 11 (1959), 34–38. [Google Scholar]
30. Franz S. and Leone M., Replica bounds for optimization problems and diluted spin systems, J Stat Phys 111 (2003), 535–564. [Google Scholar]
31. Galanis A., Stefankovic D., and Vigoda E., Inapproximability for antiferromagnetic spin systems in the tree non‐uniqueness region, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 823–831.
32. Guerra F., Broken replica symmetry bounds in the mean field spin glass model, Comm Math Phys 233 (2003), 1–12. [Google Scholar]
33. Hoover D., Relations on probability spaces and arrays of random variables, Preprint, Institute of Advanced Studies, Princeton, 1979.
34. Krzakala F., Montanari A., Ricci‐Tersenghi F., Semerjian G. and Zdeborova L., Gibbs states and the set of solutions of random constraint satisfaction problems, Proc Nat Acad Sci 104 (2007), 10318–10323. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Lováasz L., Large networks and graph limits, Vol. 60, Colloquium Publications, AMS, Providence, 2012. [Google Scholar]
36. Mézard M. and Montanari A., Information, physics and computation, Oxford University Press, Oxford, 2009. [Google Scholar]
37. Mézard M. and Parisi G., and Zecchina R., Analytic and algorithmic solution of random satisfiability problems, Science 297 (2002), 812–815. [DOI] [PubMed] [Google Scholar]
38. Montanari A. and Shah D., Counting good truth assignments of random k‐SAT formulae, In Proceedings of 18th SODA, ACM‐SIAM, New Orleans, 2007, pp. 1255–1264.
39. Mossel E., Weitz D., and Wormald N., On the hardness of sampling independent sets beyond the tree threshold, Probab Theory Relat Fields 143 (2009), 401–439. [Google Scholar]
40. Mossel E., Neeman J. and Sly A., Reconstruction and estimation in the planted partition model, Probab Theory Relat Fields 162 (2014), 1–31. [Google Scholar]
41. Nešetřil J., A combinatorial classic–sparse graphs with high chromatic number. In Lovász L. et al. editors, Erdös Centennial, Springer, Berlin, 2013. [Google Scholar]
42. Panchenko D. and Talagrand M., Bounds for diluted mean‐fields spin glass models, Probab Theory Relat Fields 130 (2004), 319–336. [Google Scholar]
43. Paley R. and Zygmund A., On some series of functions, (3), Proc Camb Philos Soc 28 (1932), 190–205. [Google Scholar]
44. Panchenko D., Spin glass models from the point of view of spin distributions, Ann Probab 41 (2013), 1315–1361. [Google Scholar]
45. Panchenko D., The Sherrington‐Kirkpatrick model, Springer, Berlin, 2013. [Google Scholar]
46. Richardson T. and Urbanke R., Modern coding theory, Cambridge University Press, Cambridge, 2008. [Google Scholar]
47. Robinson R. and Wormald N., Almost all regular graphs are Hamiltonian, Random Struct Algorithms 5 (1994), 363–374. [Google Scholar]
48. Sly A. and Sun N., The computational hardness of counting in two‐spin models on d‐regular graphs, In Proceedings of 53rd FOCS, IEEE, New Brunswick, 2012, pp. 361–369.
49. Szemerédi E., Regular partitions of graphs, Colloq Inter CNRS 260 (1978), 399–401. [Google Scholar]
50. Tao T., Szemerédi's regularity lemma revisited, Contrib Discrete Math 1 (2006), 8–28. [Google Scholar]
51. Turán P., On a theorem of Hardy and Ramanujan, J London Math Soc 9 (1934), 274–276. [Google Scholar]

[rsa20692-bib-0001] 1. Achlioptas D. and Coja‐Oghlan A., Algorithmic barriers from phase transitions, In Proceedings of 49th FOCS, IEEE, Philadelphia, 2008, pp. 793–802.

[rsa20692-bib-0002] 2. Achlioptas D. and Moore C., Random k‐SAT: Two moments suffice to cross a sharp threshold, SIAM J Comput 36 (2006), 740–762. [Google Scholar]

[rsa20692-bib-0003] 3. Achlioptas D. and Naor A., The two possible values of the chromatic number of a random graph, Ann Math 162 (2005), 1333–1349. [Google Scholar]

[rsa20692-bib-0004] 4. Achlioptas D., Naor A., and Peres Y., Rigorous location of phase transitions in hard optimization problems, Nature 435 (2005), 759–764. [DOI] [PubMed] [Google Scholar]

[rsa20692-bib-0005] 5. Achlioptas D., Naor A., and Peres Y., On the maximum satisfiability of random formulas, J ACM 54 (2007). [Google Scholar]

[rsa20692-bib-0006] 6. Achlioptas D., and Peres Y., The threshold for random k‐SAT is $2^{k} \ln 2 - O (k)$ , J AMS 17 (2004), 947–973. [Google Scholar]

[rsa20692-bib-0007] 7. Aldous D., Representations for partially exchangeable arrays of random variables, J Multivariate Anal 11 (1981), 581–598. [Google Scholar]

[rsa20692-bib-0008] 8. Aldous D. and Steele J., The objective method: probabilistic combinatorial optimization and local weak convergence, In Kesten H. editors, Probability on discrete structures, Encyclopaedia of Mathematical Sciences, Vol. 110, Springer, Berlin, 2004, pp. 1–72. [Google Scholar]

[rsa20692-bib-0009] 9. Bandyopadhyay A. and Gamarnik D., Counting without sampling: Asymptotics of the log‐partition function for certain statistical physics models, Random Struct Algorithms 33 (2008), 452–479. [Google Scholar]

[rsa20692-bib-0010] 10. Bapst V. and Coja‐Oghlan A., Harnessing the Bethe free energy, In Proceedings of 19th RANDOM, Leibniz International Proceedings in Informatics, Princeton, 2015, pp. 467–480, Also available as arXiv:1504.03975, version 1.

[rsa20692-bib-0011] 11. Bapst V. and Coja‐Oghlan A., The condensation phase transition in the regular k‐SAT model, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, Paris, 2016, pp. 22:1–22:18.

[rsa20692-bib-0012] 12. Bapst V., Coja‐Oghlan A., and Hetterich S., Rassmann F., and D. Vilenchik The condensation phase transition in random graph coloroing, Comm Math Phys 341 (2016), 543–606. [Google Scholar]

[rsa20692-bib-0013] 13. Bapst V., Coja‐Oghlan A., and Rassmann F. A positive temperature phase transition in random hypergraph 2‐coloring, Ann Appl Probab 26 (2016), 1362–1406. [Google Scholar]

[rsa20692-bib-0014] 14. Barak B., Rao A., Shaltiel R., and Wigderson A., 2‐source dispersers for sub‐polynomial entropy and Ramsey graphs beating the Frankl‐Wilson construction, In Proceedings of 38th STOC, ACM, Seattle, 2006, pp. 671–680.

[rsa20692-bib-0015] 15. Bayati M., Gamarnik D. and Tetali P., Combinatorial approach to the interpolation method and scaling limits in sparse random graphs, Ann Probab 41 (2013), 4080–4115. [Google Scholar]

[rsa20692-bib-0016] 16. Bordenave C. and Caputo P., Large deviations of empirical neighborhood distribution in sparse random graphs, Probab Theory Relat Fields 163 (2015), 149–222. [Google Scholar]

[rsa20692-bib-0017] 17. Coja‐Oghlan A. and Panagiotou K., The asymptotic k‐SAT threshold, Adv Math 288 (2016), 985–1068. [Google Scholar]

[rsa20692-bib-0018] 18. Coja‐Oghlan A., Perkins W., and Skubch K., Limits of discrete distributions and Gibbs measures on random graphs, preprint, arXiv:1512.06798, 2015.

[rsa20692-bib-0019] 19. Coja‐Oghlan A. and Perkins W., Belief Propagation on replica symmetric random factor graph models, In Proceedings of 20th RANDOM, Leibniz International Proceedings in Informatics, 2016, pp. 27:1–27:15.

[rsa20692-bib-0020] 20. Coja‐Oghlan A. and Zdeborová L., The condensation transition in random hypergraph 2‐coloring, In Proceedings of 23rd SODA, ACM‐SIAM, Kyoto, 2012, pp. 241–250.

[rsa20692-bib-0021] 21. Contucci P., Dommers S., Giardina C., and Starr S., Antiferromagnetic Potts model on the Erdös‐Rényi random graph, Commun Math Phys 323 (2013), 517–554. [Google Scholar]

[rsa20692-bib-0022] 22. Dembo A. and Montanari A., Ising models on locally tree‐like graphs, Ann Appl Probab 20 (2010), 565–592. [Google Scholar]

[rsa20692-bib-0023] 23. Dembo A., Montanari A., and Sun N., Factor models on locally tree‐like graphs, Ann Probab 41 (2013), 4162–4213. [Google Scholar]

[rsa20692-bib-0024] 24. Dembo A., Montanari A., Sly A. and Sun N., The replica symmetric solution for Potts models on d‐regular graphs, Comm Math Phys 327 (2014), 551–575. [Google Scholar]

[rsa20692-bib-0025] 25. Ding J., Sly A., and Sun N., Satisfiability threshold for random regular NAE‐SAT, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 814–822.

[rsa20692-bib-0026] 26. Ding J., Sly A., and Sun N., Proof of the satisfiability conjecture for large k , In Proceedings of 47th STOC, ACM, Portland, 2015, pp. 59–68.

[rsa20692-bib-0027] 27. Durrett R., Probability: Theory and examples, 4th edition, Cambridge University Press, Cambridge, 2010. [Google Scholar]

[rsa20692-bib-0028] 28. Erdös P., Some remarks on the theory of graphs, Bull Am Math Soc 53 (1947), 292–294. [Google Scholar]

[rsa20692-bib-0029] 29. Erdös P. Graph theory and probability, Canad J Math 11 (1959), 34–38. [Google Scholar]

[rsa20692-bib-0030] 30. Franz S. and Leone M., Replica bounds for optimization problems and diluted spin systems, J Stat Phys 111 (2003), 535–564. [Google Scholar]

[rsa20692-bib-0031] 31. Galanis A., Stefankovic D., and Vigoda E., Inapproximability for antiferromagnetic spin systems in the tree non‐uniqueness region, In Proceedings of 46th STOC, ACM, New York, 2014, pp. 823–831.

[rsa20692-bib-0032] 32. Guerra F., Broken replica symmetry bounds in the mean field spin glass model, Comm Math Phys 233 (2003), 1–12. [Google Scholar]

[rsa20692-bib-0033] 33. Hoover D., Relations on probability spaces and arrays of random variables, Preprint, Institute of Advanced Studies, Princeton, 1979.

[rsa20692-bib-0034] 34. Krzakala F., Montanari A., Ricci‐Tersenghi F., Semerjian G. and Zdeborova L., Gibbs states and the set of solutions of random constraint satisfaction problems, Proc Nat Acad Sci 104 (2007), 10318–10323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[rsa20692-bib-0035] 35. Lováasz L., Large networks and graph limits, Vol. 60, Colloquium Publications, AMS, Providence, 2012. [Google Scholar]

[rsa20692-bib-0036] 36. Mézard M. and Montanari A., Information, physics and computation, Oxford University Press, Oxford, 2009. [Google Scholar]

[rsa20692-bib-0037] 37. Mézard M. and Parisi G., and Zecchina R., Analytic and algorithmic solution of random satisfiability problems, Science 297 (2002), 812–815. [DOI] [PubMed] [Google Scholar]

[rsa20692-bib-0038] 38. Montanari A. and Shah D., Counting good truth assignments of random k‐SAT formulae, In Proceedings of 18th SODA, ACM‐SIAM, New Orleans, 2007, pp. 1255–1264.

[rsa20692-bib-0039] 39. Mossel E., Weitz D., and Wormald N., On the hardness of sampling independent sets beyond the tree threshold, Probab Theory Relat Fields 143 (2009), 401–439. [Google Scholar]

[rsa20692-bib-0040] 40. Mossel E., Neeman J. and Sly A., Reconstruction and estimation in the planted partition model, Probab Theory Relat Fields 162 (2014), 1–31. [Google Scholar]

[rsa20692-bib-0041] 41. Nešetřil J., A combinatorial classic–sparse graphs with high chromatic number. In Lovász L. et al. editors, Erdös Centennial, Springer, Berlin, 2013. [Google Scholar]

[rsa20692-bib-0042] 42. Panchenko D. and Talagrand M., Bounds for diluted mean‐fields spin glass models, Probab Theory Relat Fields 130 (2004), 319–336. [Google Scholar]

[rsa20692-bib-0043] 43. Paley R. and Zygmund A., On some series of functions, (3), Proc Camb Philos Soc 28 (1932), 190–205. [Google Scholar]

[rsa20692-bib-0044] 44. Panchenko D., Spin glass models from the point of view of spin distributions, Ann Probab 41 (2013), 1315–1361. [Google Scholar]

[rsa20692-bib-0045] 45. Panchenko D., The Sherrington‐Kirkpatrick model, Springer, Berlin, 2013. [Google Scholar]

[rsa20692-bib-0046] 46. Richardson T. and Urbanke R., Modern coding theory, Cambridge University Press, Cambridge, 2008. [Google Scholar]

[rsa20692-bib-0047] 47. Robinson R. and Wormald N., Almost all regular graphs are Hamiltonian, Random Struct Algorithms 5 (1994), 363–374. [Google Scholar]

[rsa20692-bib-0048] 48. Sly A. and Sun N., The computational hardness of counting in two‐spin models on d‐regular graphs, In Proceedings of 53rd FOCS, IEEE, New Brunswick, 2012, pp. 361–369.

[rsa20692-bib-0049] 49. Szemerédi E., Regular partitions of graphs, Colloq Inter CNRS 260 (1978), 399–401. [Google Scholar]

[rsa20692-bib-0050] 50. Tao T., Szemerédi's regularity lemma revisited, Contrib Discrete Math 1 (2006), 8–28. [Google Scholar]

[rsa20692-bib-0051] 51. Turán P., On a theorem of Hardy and Ramanujan, J London Math Soc 9 (1934), 274–276. [Google Scholar]

PERMALINK

Harnessing the Bethe free energy†

Victor Bapst

Amin Coja‐Oghlan

ABSTRACT

1. INTRODUCTION

1.1. Related Work

1.2. Notation

2. PROBABILITY MEASURES ON THE CUBE

2.1. Examples

2.2. Homogeneity

Theorem 2.1

Corollary 2.2

Corollary 2.3

Corollary 2.4

Proposition 2.5

2.3. Proof of Theorem 2.1

Lemma 2.6

Lemma 2.7

Proof of Theorem 2.1

2.4. Proof of Corollary 2.2

Lemma 2.8

Proof of Corollary 2.2

2.5. Proof of Corollary 2.3

2.6. Proof of Corollary 2.4

2.7. Proof of Proposition 2.5

3. FACTOR GRAPHS

3.1. Examples

3.2. Random Factor Graphs

Definition 3.1

Proposition 3.2

Example 3.3

Example 3.4

Example 3.5

Remark 3.6

3.3. Local Weak Convergence

Definition 3.7

3.4. The Planted Distribution

3.5. Short Cycles

Proposition 3.8

Remark 3.9

4. THE BETHE FREE ENERGY

4.1. An Educated Guess

Definition 4.1

Remark 4.2

Lemma 4.3

4.2. Symmetry

Theorem 4.4

4.3. Non‐reconstruction

Theorem 4.5

4.4. Gibbs Uniqueness

Corollary 4.6

4.5. Meet the Expectation

Proposition 4.7

Corollary 4.8

Corollary 4.9

4.6. Proof of Theorem 4.4

Lemma 4.10

Lemma 4.11

Proof of Theorem 4.4

Proof of Theorem 4.4

4.7. Proof of Theorem 4.5

Lemma 4.12

Proof of Theorem 4.5

Proof of Lemma 4.12

5. CONDITIONING ON THE LOCAL STRUCTURE

5.1. A Generalised Configuration Model

Lemma 5.1

Lemma 5.2

Lemma 5.3

Proof of Proposition 4.7

5.2. Proof of Lemma 5.2

Lemma 5.4

Lemma 5.5

Lemma 5.6

Proof of Lemma 5.2

5.3. Proof of Lemma 5.3

ACKNOWLEDGEMENTS

Contributor Information

REFERENCES

Harnessing the Bethe free energy^{^†}