Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Apr 7.
Published in final edited form as: Adv Neural Inf Process Syst. 2019 Dec;32:10552–10563.

G2SAT: Learning to Generate SAT Formulas

Jiaxuan You 1,*, Haoze Wu 1,*, Clark Barrett 1, Raghuram Ramanujan 2, Jure Leskovec 1
PMCID: PMC7138247  NIHMSID: NIHMS1062394  PMID: 32265581

Abstract

The Boolean Satisfiability (SAT) problem is the canonical NP-complete problem and is fundamental to computer science, with a wide array of applications in planning, verification, and theorem proving. Developing and evaluating practical SAT solvers relies on extensive empirical testing on a set of real-world benchmark formulas. However, the availability of such real-world SAT formulas is limited. While these benchmark formulas can be augmented with synthetically generated ones, existing approaches for doing so are heavily hand-crafted and fail to simultaneously capture a wide range of characteristics exhibited by real-world SAT instances. In this work, we present G2SAT, the first deep generative framework that learns to generate SAT formulas from a given set of input formulas. Our key insight is that SAT formulas can be transformed into latent bipartite graph representations which we model using a specialized deep generative neural network. We show that G2SAT can generate SAT formulas that closely resemble given real-world SAT instances, as measured by both graph metrics and SAT solver behavior. Further, we show that our synthetic SAT formulas could be used to improve SAT solver performance on real-world benchmarks, which opens up new opportunities for the continued development of SAT solvers and a deeper understanding of their performance.

1. Introduction

The Boolean Satisfiability (SAT) problem is central to computer science, and finds many applications across Artificial Intelligence, including planning [26], verification [9], and theorem proving [16]. SAT was the first problem to be shown to be NP-complete [11], and there is believed to be no general procedure for solving arbitrary SAT instances efficiently. Nevertheless, modern solvers are able to routinely decide large SAT instances in practice, with different algorithms proving to be more successful than others on particular problem instances. For example, incomplete search methods such as WalkSAT [37] and survey propagation [8] are more effective at solving large, randomly generated formulas, while complete solvers leveraging conflict-driven clause learning (CDCL) [32] fare better on large structured SAT formulas that commonly arise in industrial settings.

Understanding, developing and evaluating modern SAT solvers relies heavily on extensive empirical testing on a suite of benchmark SAT formulas. Unfortunately, in many domains, availability of benchmarks is still limited. While this situation has improved over the years, new and interesting benchmarks — both real and synthetic — are still in demand and highly welcomed by the SAT community. Developing expressive generators of structured SAT formulas is important, as it would provide for a richer set of evaluation benchmarks, which would in turn allow for the development of better and faster SAT solvers. Indeed, the problem of pseudo-industrial SAT formula generation has been identified as one of the ten key challenges in propositional reasoning and search [38].

One promising approach for tackling this challenge is to represent SAT formulas as graphs, thus recasting the original problem as a graph generation task. Specifically, every SAT formula can be converted into a corresponding bipartite graph, known as its literal-clause graph (LCG), via a bijective mapping. Prior work in pseudo-industrial SAT instance generation has relied on hand-crafted algorithms [17, 18], focusing on capturing one or two of the graph statistics exhibited by real-world SAT formulas [1, 36]. As researchers continue to uncover new and interesting characteristics of real-world SAT instances [1, 25, 36, 33], previous SAT generators might become invalid, and handcrafting new models that simultaneously capture all the pertinent properties becomes increasingly difficult. On the other hand, recent work on deep generative models for graphs [6, 31, 45, 47] has demonstrated their ability to capture many of the essential features of real-world graphs such as social networks and citation networks, as well as graphs arising in biology and chemistry. However, these models do not enforce bipartiteness and therefore cannot be directly employed in our setting. While it might be possible to post-process these generated graphs, such a solution would be ad hoc, computationally expensive and would fail to exploit the unique structure of bipartite graphs.

In this paper, we present G2SAT, the first deep generative model that learns to generate SAT formulas based on a given set of input formulas. We use LCGs to represent SAT formulas, and formulate the task of SAT formula generation as a bipartite graph generation problem. Our key insight is that any bipartite graph can be generated by starting with a set of trees, and then applying a sequence of node merging operations over the nodes from one of the two partitions. As we merge nodes, trees are also merged, and complex bipartite structures begin to appear (Figure 1, left). In this manner, a set of input bipartite graphs (SAT formulas) can be characterized by a distribution over the sequence of node merging operations. Assuming we can capture/learn the distribution over the pairs of nodes to merge, we can start with a set of trees and then keep merging nodes in order to generate realistic bipartite graphs (i.e., realistic SAT formulas). G2SAT models this iterative node merging process in an auto-regressive manner, where a node merging operation is viewed as a sample from the underlying conditional distribution that is parameterized by a Graph Convolutional Neural Network (GCN) [20, 21, 29, 46], and the same GCN is shared across all the steps of the generation process.

Figure 1:

Figure 1:

An overview of the proposed G2SAT model. Top left: A given bipartite graph can be decomposed into a set of disjoint trees by applying a sequence of node splitting operations. Orange node c in graph Gi is split into two blue c nodes in graph Gi−1. Every time a node is split, one more node appears in the right partition. Right: We use node pairs gathered from such a sequence of node splitting operations to train a GCN-based classifier that predicts whether a pair of c nodes should be merged. Bottom left: Given such a classifier, G2SAT generates a bipartite graph by starting with a set of trees G0 and applying a sequence of node merging operations, where two blue nodes in graph Gi−1 get merged in graph Gi. G2SAT uses the GCN-based classifier that captures the bipartite graph structure to sequentially decide which nodes to merge from a set of candidates. Best viewed in color.

This formulation raises the following question: how do we devise a sequential generative process when we are only given a static input SAT formula? In other words, how do we generate training data for our generator? We resolve this challenge as follows (Figure 1). We define node splitting as the inverse operation to node merging. We apply this node splitting operation to a given input bipartite graph (a real-world SAT formula) and decompose it into a set of trees. We then reverse the splitting sequence, so that we start with a set of trees and learn from the sequence of node merging operations that recovers a realistic SAT formula. We train a GCN-based classifier that decides which two nodes to merge next, based on the structure of the graph generated so far.

At graph generation time, we initialize G2SAT with a set of trees and iteratively merge node pairs based on the conditional distribution parameterized by the trained GCN model, until a user-specified stopping criterion is met. We utilize an efficient two-phase generation procedure: in the node proposal phase, candidate node pairs are randomly drawn, and in the node merging phase, the learned GCN model is applied to select the most likely node pair to merge.

Experiments demonstrate that G2SAT is able to generate formulas that closely resemble the input real-world SAT instances in many graph-theoretic properties such as modularity and the presence of scale-free structures, with 24% lower relative error compared to state-of-the-art methods. More importantly, G2SAT generates formulas that exhibit the same hardness characteristics as real-world SAT formulas in the following sense: when the generated instances are solved using various SAT solvers, those solvers that are known to be effective on structured, real-world instances consistently outperform those solvers that are specialized in solving random SAT instances. Moreover, our results suggest that we can use our synthetically generated formulas to more effectively tune the hyperparameters of SAT solvers, achieving an 18% speedup in run time on unseen formulas, compared to tuning the solvers only on the formulas used for training.1

2. Preliminaries

Goal of generating SAT formulas

Our goal is to design a SAT generator that, given a suite of SAT formulas, generates new SAT formulas with similar properties. Our aim is to capture not only graph theoretic properties, but also realistic SAT solver behavior. For example, if we train our G2SAT model on formulas from application domain X, then solvers that traditionally excel in solving problems in domain X, should also excel in solving synthetic G2SAT formulas (rather than, say, solvers that specialize in solving random SAT formulas).

SAT formulas and their graph representations

A SAT formula ϕ is composed of Boolean variables xi connected by the logical operators and (∧), or (∨), and not (¬). A formula is satisfiable if there exists an assignment of Boolean values to the variables such that the overall formula evaluates to true. In this paper, we are concerned with formulas in Conjunctive Normal Form (CNF)2, i.e., formulas expressed as conjunctions of disjunctions. Each disjunction is called a clause, while a Boolean variable xi or its negation ¬xi is called a literal. For example, (x1x2 ∨ ¬x3) ∧ (¬x1 ∨ ¬x2) is a CNF formula with two clauses that can be satisfied by assigning true to x1 and false to x2.

Traditionally, researchers have studied four different graph representations for SAT formulas [4]: (1) Literal-Clause Graph (LCG): there is a node for each literal and each clause, with an edge denoting the occurrence of a literal in a clause. An LCG is bipartite and there exists a bijection between CNF formulas and LCGs. (2) Literal-Incidence Graph (LIG): there is a node for each literal and two literals have an edge if they co-occur in a clause. (3) Variable-Clause Graph (VCG): obtained by merging the positive and negative literals of the same variables in an LCG. (4) Variable-Incidence Graph (VIG): obtained by performing the same literal merging operation on the LIG. In this paper, we use LCGs to represent SAT formulas.

LCGs as bipartite graphs

We represent a bipartite graph G=(VG,EG) by its node set VG={v1G,,vnG} and edge set EG{(viG,vjG)|viG,vjGVG}. In the rest of paper, we omit the superscript G whenever it is possible to do so. Nodes in a bipartite graph can be split into two disjoint partitions V1 and V2 such that V=V1V2. Edges only exist between nodes in different partitions, i.e., E{(vi,vj)|viV1,vjV2}. An LCG with n literals and m clauses has V1={l1,,ln} and V2={c1,,cm}, where V1 and V2 are referred to as the literal partition and the clause partition, respectively. We may also write out li as xi or ¬xi when specifying the literal sign is necessary.

Benefits of using LCGs to generate SAT formulas

While we choose to work with LCGs because they are bijective to SAT formulas, the LIG is also a viable alternative. Unlike LCGs, there are no explicit constraints over LIGs, and thus, previously developed general deep graph generators could in principle be used. However, the ease of generating LIGs is offset by the fact that key information is lost during the translation from the corresponding SAT formula. In particular, given a pair of literals, the LIG only encodes whether they co-occur in a clause but fails to capture how many times and in which clauses they co-occur. It can further be shown that an LIG corresponds to a number of SAT formulas that is at least exponential in the number of 3-cliques in the LIG. This ambiguity severely limits the usefulness of LIGs for SAT benchmark generation.

3. Related Work

SAT Generators

Existing synthetic SAT generators are hand-crafted models that are typically designed to generate formulas that fit a particular graph statistic. The mainstream generators for pseudo-industrial SAT instances include the Community Attachment (CA) model [17], which generates formulas with a given VIG modularity, and the Popularity-Similarity (PS) model [18], which generates formulas with a specific VCG degree distribution. In addition, there are also generators for random k-SAT instances [7] and crafted instances that come from translations of structured combinatorial problems, such as graph coloring, graph isomorphism, and Ramsey numbers [30]. Currently, all SAT generators are hand-crafted and machine learning provides an exciting alternative.

Deep Graph Generators

Existing deep generative models of graphs fall into two categories. In the first class are models that focus on generating perturbations of a given graph, by direct decoding from computed node embeddings [28] or latent variables [19], or learning the random walk distribution of a graph [6]. The second class comprises models that can learn to generate a graph by sequentially adding nodes and edges [31, 47, 45]. Domain specific generators for molecular graphs [12, 24] and 3D point cloud graphs [42] have also been developed. However, current deep generative models of graphs do not readily apply to SAT instance generation. Thus, we develop a novel bipartite graph generator that respects all the constraints imposed by graphical representations of SAT formulas and generates the formula graph via a sequence of node merging operations.

Deep learning for SAT

NeuroSAT also represents SAT formulas as graphs and computes node embeddings using GCNs [40]. However, NeuroSAT focuses on using the embeddings to solve SAT formulas, while we aim to generate SAT formulas. A preliminary version of the work presented in this paper appeared in [43], where existing graph generative models were used to learn the LIG of a SAT formula. However, extensive post-processing was required to extract a formula from a generated LIG, since an LIG is an ambiguous representation of a SAT formula. In this work, we develop a new deep graph generative model, that, unlike existing techniques, is able to directly learn the bijective graph representation of a SAT formula, and therefore better capture its characteristics.

4. The G2SAT Framework

4.1. G2SAT: Generating Bipartite Graphs by Iterative Node Merging Operations

As discussed in Section 2, a SAT formula is uniquely represented by its LCG which is a bipartite graph. From the perspective of generative models, our primary objective is to learn a distribution over bipartite graphs, based on a set of observed bipartite graphs G sampled from the data distribution p(G). Each bipartite graph GG may have a different number of nodes and edges. Due to the complex dependency between nodes and edges, directly learning p(G) is challenging. Therefore, we generate a graph via an n-step iterative process, p(G)=i=1np(Gi|G1,,Gi1), where Gi refers to an intermediate graph at step i in the iterative generation process. Since we focus on generating static graphs, we assume that the order of the generation trajectory does not matter, as long as the same graph is generated. This assumption implies the following Markov property over the conditional distribution, p(Gi|G1, …, Gi−1) = p(Gi|Gi−1).

The key to a successful iterative graph generative model is a proper instantiation of the conditional distribution p(Gi|Gi−1). Existing approaches [31, 45, 47] often model p(Gi|Gi−1) as the distribution over the random addition of nodes or edges to Gi−1. While in theory this formulation allows the generation of any kind of graph, it cannot satisfy the hard partition constraint for bipartite graphs. In contrast, our proposed G2SAT has a simple generation process that is guaranteed to preserve the bipartite partition constraint, without the need for hand-crafted generation rules or post-processing procedures. The G2SAT framework relies on node splitting and merging operations, which are defined as follows.

Definition 1

The node splitting operation, when applied to node v, removes some edges between v and its neighboring nodes, and then connects those edges to a new node u. The node merging operation, when performed over two nodes u and v, removes all the edges between v and its neighboring nodes, and then connects those edges to u. Formally, NodeSplit(u,G) returns a tuple (u, v,G′), and NodeMerge(u, v,G) returns a tuple (u,G′).

Note that according to this definition, a node merging operation can always be reversed by a node splitting operation. The core idea underlying G2SAT is then motivated by the following observation.

Observation 1

A bipartite graph can always be transformed into a set of trees by a sequence of node splitting operations over the nodes in one of the partitions.

The proof of this claim follows from the fact that the node splitting operation strictly reduces a node’s degree. Therefore, repeatedly applying node splitting to all the nodes in a partition will ultimately reduce the degree of all those nodes to 1, producing a set of trees (Figure 1, Left). This observation implies that a bipartite graph can always be generated via a sequence of node merging operations. In G2SAT, we always merge clause nodes in the clause partition V2Gi1 for a given graph Gi−1. We then design the following instantiation of p(Gi|Gi−1),

p(Gi|Gi1)=p(NodeMerge(u,v,Gi1)|Gi1)=Multimonial(huThv/Z|u,vV2Gi1) (1)

where hu and hv are the embeddings for nodes u and v, and Z is the normalizing constant that ensures that the distribution Multimonial(·) is valid. We aim for embeddings hu that capture the multi-hop neighborhood structure of a node u and that can be computed from a single trainable model. Further, this model needs to be capable of generalizing across different generation stages and different graphs. Therefore, we use the GraphSAGE framework [20] to compute node embeddings, which is a variant of GCNs that has been shown to have strong inductive learning capabilities across different graphs. Specifically, the l-th layer of GraphSAGE can be written as

nu(l)=AGG(RELU(Q(l)hv(l)+q(l)|vN(u)))
hu(l+1)=RELU(W(l)CONCAT(hu(l),nu(l)))

where hu(l) is the l-th layer node embedding for node u, N(u) is the local neighborhood of u, AGG(·) is an aggregation function such as mean pooling, and Q(l), q(l), W(l) are trainable parameters. The input node features are length-3 one-hot vectors, which are used to represent the three node types in LCGs, i.e., positive literals, negative literals and clauses. In addition, since each literal and its negation are closely related, we add an additional message passing path between them.

4.2. Scalable G2SAT with Two-phase Generation Scheme

LCGs can easily have tens of thousands of nodes; thus, there are millions of candidate node pairs that could be merged. This makes the computation of the normalizing constant Z in Equation 1 infeasible. To avoid this issue, we design a two-phase scheme to instantiate Equation 1, which includes a node proposal phase and a node merging phase (Figure 1, right). Intuitively, the idea is to begin with a fixed oracle that proposes random candidate node pairs. Then, a model only needs to decide if the proposed node pair should be merged or not, which is an easier learning task compared to selecting from among millions of candidate options. Instead of directly learning and sampling from p(Gi|Gi−1), we introduce additional random variables u and v to represent random nodes, and then learn the joint distribution p(Gi, u, v|Gi−1) = p(u, v|Gi−1)p(Gi|Gi−1, u, v). Here, p(u, v|Gi−1) corresponds to the node proposal phase and p(Gi|Gi−1, u, v) models the node merging phase.

In theory, p(u, v|Gi−1) can be any distribution as long as it has non-empty support. Since LCGs are inherently static graphs, there is little prior knowledge or additional information on how this iterative generation process should proceed. Therefore, we implement the node proposal phase such that a random node pair is sampled from all candidate clause nodes uniformly at random. Then, in the node merging phase, instead of computing the dot product between all possible node pairs, the model only needs to compute the dot product between the sampled node pairs. Specifically, we have

p(Gi,u,v|Gi1)=p(u,v|Gi1)p(NodeMerge(u,v,Gi1)|Gi1,u,v)=Uniform({(u,v)|u,vV2Gi1})Bernoulli(σ(huThv)|u,v) (2)

where Uniform is the discrete uniform distribution and σ(·) is the sigmoid function.

4.3. G2SAT at Training Time

The two-phase generation scheme described in Section 4.2 transforms the bipartite graph generation task into a binary classification task. We train the classifier to minimize the following binary cross entropy loss:

L=Eu,vppos[log(σ(huThv))]Eu,vpneg[log(1σ(huThv))] (3)

where ppos and pneg are the distributions over positive and negative training examples (i.e. node pairs). We say a node pair is a positive example if the node pair should be merged according to the training set. To acquire the necessary training data from input bipartite graphs, we develop a procedure that is described in Algorithm 1. Given an input bipartite graph G, we apply the node splitting operation to the graph for n=|EG||V2G| steps, which guarantees that the input graph will be decomposed into a set of trees. Within each step, a random node s in partition V2Gi with degree greater than 1 is chosen for splitting, and a random subset of edges that connect to s is chosen to connect to a new node. After the split operation, we obtain an updated graph Gi−1, as well as the split nodes u+ and v+, which are treated as a positive training example. Then, another node v, that is distinct from u+ and v+, is randomly chosen from the nodes in V2Gi1, and (u+, v) are viewed as a negative training example. The data tuple (u+, v+, v, Gi−1) is saved in the dataset D. We also save the step count n and the graph G0 as “graph templates”, which are later used to initialize G2SAT at inference time. The procedure is repeated r times until the desired number of data points are gathered. Finally, G2SAT is trained with the dataset D to minimize the objective listed in Equation 3.

Algorithm 1.

G2SAT at training time

Input: Bipartite graphs G, number of repetitions r
Output: Graph templates T
D,T
for k = 1, …, r do
GG,n|EG||V2G|,GnG
for i = n, …, 1 do
  s{u|uV2Gi,Degree(u)>1}
  (u+, v+, Gi−1) ← NodeSplit(s, Gi)
  vV2Gi1\{u+,v+}
  DD{(u+,v+,v,Gi1)}
TT{(G0,n)}
Train G2SAT with D to minimize Eq. 3

Algorithm 2.

G2SAT at inference time

Input: Graph templates T, number of output graphs r, number of proposed node pairs o
Output: Generated bipartite graphs G
for k = 1, …, r do
(G0,n)T
for i = 0, …, n − 1 do
  P
  for j = 1, …, o do
   uV2Gi,v{s|sV2Gi,(s,x)EGi,(s,¬x)EGi,xN(u)}
   P=P{(u,v)}
  (u+,v+)argmax{huThv|(u,v)P,hu=GCN(u),hv=GCN(v)}
  Gi+1 ← NodeMerge(u+, v+, Gi)
G=G{Gn}

4.4. G2SAT at Inference Time

A trained G2SAT model can be used to generate graphs. We summarize the procedure in Algorithm 2. At graph generation time, we first initialize G2SAT with a graph template sampled from T gathered at training time, which specifies the initial graph G0 and the number of generation steps n. Note that G2SAT can take bipartite graphs with arbitrary size as input and iterate for a variable number of steps. The reason we specify the initial state and the number of steps is to control the behavior of G2SAT and simplify the experiment setting.

At each generation step, we use the two-phase generation scheme described in Section 4.2. In the node proposal phase, we additionally make sure that the sampled node pair does not correspond to a vacuous clause, i.e., if u,v is a valid node pair, then ∀xN(u), we ensure that (v,x)EGi and (v,¬x)EGi. We parallelize the algorithm by sampling o random node pair proposals at once and feeding them to the node merging phase. In the node merging phase, although following Equation 2 would allow us to sample from the true distribution, we find in practice that it usually requires sampling a large number of candidate nodes pairs until a positive node pair is predicted by the GCN model. Therefore, we use a greedy algorithm that selects the most likely node pair to be merged among the o proposed node pairs and merge those nodes. Admittedly, this biases the generator away from the true data distribution. However, our experiments reveal that the synthesized graphs are nonetheless reasonable. After n steps, the generated graph Gn is saved as an output.

5. Experiments

5.1. Dataset and Evaluation

Dataset

We use 10 small real-world SAT formulas from the SATLIB benchmark library [23] and past SAT competitions.1 The two data sources contain SAT formulas generated from a variety of application domains, such as bounded model checking, planning, and cryptography. We use the standard SatElite preprocessor [13] to remove duplicate clauses and perform polynomial-time simplifications (for example, unit propagation). The preprocessed formulas contain 82 to 1122 variables and 327 to 4555 clauses.

We evaluate if the generated SAT formulas preserve the properties of the input training SAT formulas, as measured by graph statistics and SAT solver performance. We then investigate whether the generated SAT formulas can indeed help in designing better domain-specific SAT solvers.

Graph statistics

We focus on the graph statistics studied previously in the SAT literature [1, 36]. In particular, we consider the VIG, VCG and LCG representations of SAT formulas. We measure the modularity [35] (in VIG, VCG, LCG), average clustering coefficient [34] (in VIG) and the scale-free structure parameters as measured by variable αv and clause αc [1, 10] (in VCG).

SAT solver performance

We report the relative SAT solver performance, i.e., given k SAT solvers, we rank them based on their performance over the SAT formulas used for training and the generated SAT formulas, and evaluate how well the two rankings align. Previous research has shown that SAT instances can be made hard using various post-processing approaches [15, 39, 44]. Therefore, the absolute hardness of the formulas is not a good indicator of how realistic the formulas are. On the other hand, it is not trivial for a post-processing procedure to precisely manipulate the relative performance of a set of SAT solvers. Therefore, we report the relative solver performance for a fairer comparison. We took the three best performing solvers from both the application track and the random track of the 2018 SAT competition [22], which are denoted as I1, I2, I3, and R1, R2, R3 respectively.2 Our experiments confirm that solvers that are tailored to real-world SAT formulas (I1, I2, I3) indeed outperform solvers that focus on random SAT formulas (R1, R2, R3), over the training formulas. Therefore, we measure if on the generated formulas, the solvers I similarly outperform the solvers R, as measured by ranking accuracy. All the run time performances are measured by wall clock time under carefully controlled experimental settings.

Application: Developing better SAT solvers

Finally, we consider the scenario where people wish to use the synthetic formulas for developing better SAT solvers. Specifically, we use either the 10 training SAT formulas or the generated SAT formulas to guide the hyperparameter selection of a popular SAT solver called Glucose [2]. We conduct a grid search over two of its hyperparameters — the variable decay vd, that influences the ordering of the variables in the search tree, and the clause decay cd, that influences which learned clauses are to be removed [14]. We sweep over the set {0.75, 0.85, 0.95} for vd, and the set {0.7, 0.8, 0.9, 0.99, 0.999} for cd. We measure the run time of the SAT solvers using the optimal hyperparameters found by grid search, over 22 real-world SAT formulas unobserved by any of the models. Since the number of training SAT formulas is limited, we expect that using the abundant generated SAT formulas will lead to better hyperparameter choices.

5.2. Models

We compare G2SAT with two state-of-the-art generators for real-world SAT formulas. Both generators are prescribed models designed to match a specific graph property. To properly generate formulas using these baselines, we set their arguments to match the corresponding statistics in the training set. We generate 200 formulas each using G2SAT and the baseline models.

G2SAT

We implement G2SAT with a 3-layer GraphSAGE model using mean pooling and ReLU activation [20] with hidden and output embedding size of 32. We use the Adam optimizer [27] with a learning rate of 0.001 to train the model until the validation accuracy plateaus.

Community Attachment (CA)

The CA model generates formulas to fit a desired VIG modularity Q [17]. The output of the algorithm is a SAT formula with n variables and m clauses, each of length k, such that the optimal modularity for any c-partition of the VIG of the formula is approximately Q.

Popularity-Similarity (PS)

The PS model generates formulas to fit desired αv and αc [18]. The model accepts a temperature parameter T that trades off the modularity and the (αv, αc) measures of the generated formulas. We run PS with two temperature settings, T = 0 and T = 1.5.

5.3. Results

Graph statistics

The graph statistics of the generated SAT formulas are shown in Table 1. We observe that G2SAT is the only model that is able to closely fit all the graph properties that we measure, whereas the baseline models only fit some of the statistics and fail to perform well on the other statistics. Surprisingly, G2SAT fits the modularity even better than CA, which is tailored for fitting that statistic. We compute the relative error over the generated graph statistics with respect to the ground-truth statistics, and G2SAT can reduce the relative error by 24% on average compared with baseline methods. To further illustrate this performance gain, we plot the distribution of the selected properties over the generated formulas in Figure 2, where each dot corresponds to a graph. We see that G2SAT nicely interpolates and extrapolates on all the statistics of the input graphs, while the baselines only do well on some of the statistics.

Table 1:

Graph statistics of generated formulas (mean ± std. (relative error to training formulas)).

Method VIG
VCG
LCG
Clustering Modularity Variable αv Clause αc Modularity Modularity
Training 0.50±0.07 0.58±0.09 3.57±1.08 4.53±1.09 0.74±0.06 0.63±0.05

CA 0.33±0.08(34%) 0.48±0.10(17%) 6.30±1.53(76%) N/A 0.65±0.08(12%) 0.53±0.05(16%)
PS(T=0) 0.82±0.04(64%) 0.72±0.13(24%) 3.25±0.89(9%) 4.70±1.59(4%) 0.86±0.05(16%) 0.64±0.05(2%)
PS(T=1.5) 0.30±0.10(40%) 0.14±0.03(76%) 4.19±1.10(17%) 6.86±1.65(51%) 0.40±0.05(46%) 0.41±0.05(35%)
G2SAT 0.41±0.09(18%) 0.54±0.11(7%) 3.57±1.08(0%) 4.79±2.80(6%) 0.68±0.07(8%) 0.67±0.03(6%)
Figure 2:

Figure 2:

Scatter plots of distributions of selected properties of the generated formulas.

SAT solver performance

As seen in Table 2, the ranking of solver performance over the formulas generated by G2SAT and CA align perfectly with their ranking over the training graphs. Both models are able to correctly generate formulas on which application-focused solvers (I1, I2, I3) outperform random-focused solvers (R1, R2, R3). By contrast, PS models do poorly at this task.

Table 2:

Relative SAT Solver Performance on training as well as synthetic SAT formulas.

 Method  Solver ranking Accuracy
 Training I2, I3, I1, R2, R3, R1  100%

CA I2, I3, I1, R2, R3, R1 100%
 PS(T=0) R3, I3, R2, I2, I1, R1  33%
 PS(T=1.5) R3, R2, I3, I1, I2, R1  33%
 G2SAT I1, I2, I3, R2, R3, R1  100%

Application: Developing better SAT solvers

The run time gain of tuning solvers on synthetic formulas compared to tuning on a small set of real-world formulas is shown in Table 3. While all the generators are able to improve the SAT solver’s performance by suggesting different hyperparameter configurations, G2SAT is the only method that finds one that results in a large performance gain (18% faster run time) on unobserved SAT formulas. Although this experiment is limited in scale, the promising results indicate that G2SAT could open up opportunities for developing better SAT solvers, even in application domains where benchmarks are scarce.

Table 3:

Performance gain when using generated SAT formulas to tune SAT solvers.

Method Best parameters Runtime(s) Gain
Training (0.95, 0.9) 2679 N/A

CA (0.75, 0.99) 2617 2.31%
PS(T=0) (0.75, 0.999) 2668 0.41%
PS(T=1.5) (0.95, 0.9) 2677 0.07%
G2SAT (0.95, 0.99) 2190 18.25%

5.4. Analysis of Results

Scalability of G2SAT

While existing deep graph generative models can only generate graphs with up to about 1,000 nodes [6, 19, 47], the novel design of the G2SAT framework enables the generation of graphs that are an order of magnitude larger. The largest graph we have generated has 39,578 nodes and 102,927 edges, which only took 489 seconds (data-processing time excluded) on a single GPU. Figure 3 shows the time-scaling behavior for both training (from 100k batches of node pairs) and formula generation. We found that G2SAT scales roughly linearly for both tasks with respect to the number of clauses.

Figure 3:

Figure 3:

G2SAT Run time.

Extrapolation ability of G2SAT

To determine whether a trained model can learn to generate SAT instances different from those in the training set, we design an extrapolation experiment as follows. We train on 10 small formulas with 327 to 4,555 clauses, while forcing G2SAT to generate large formulas with 13,028 to 27,360 clauses. We found that G2SAT can generate large graphs whose characteristics are similar to those of the small training graphs, which shows that G2SAT has learned non-trivial properties of real-world SAT problems, and thus can extrapolate beyond the training set. Specifically, the VCG modularity of the large formulas generated by G2SAT is 0.81 ± 0.03, while the modularity of the small formulas used to train G2SAT is 0.74 ± 0.06.

Ablation study

Here we demonstrate that the expressive power of GCN model significantly affects the generated formulas. Figure 4 shows the effect of the number of layers in the GCN neural network model on the modularity of the generated formulas. As the number of layers increases, the average modularity of the generated formulas becomes closer to that of the training formulas, which indicates that machine learning contributes significantly to the efficacy of G2SAT. The other graph properties that we measured generally follow the same pattern as well.

Figure 4:

Figure 4:

Ablation study.

6. Conclusions

In this paper, we introduced G2SAT, the first deep generative model for SAT formulas. In contrast to existing SAT generators, G2SAT does not rely on hand-crafted algorithms and is able to generate diverse SAT formulas similar to input formulas, as measured by many graph statistics and SAT solver performance. While future work is called for to generate larger and harder formulas, we believe our framework shows great potential for understanding and improving SAT solvers.

Acknowledgements

Jure Leskovec is a Chan Zuckerberg Biohub investigator. We gratefully acknowledge the support of DARPA under No. FA865018C7880 (ASED) and MSC; NIH under No. U54EB020405 (Mobilize); ARO under No. 38796-Z8424103 (MURI); IARPA under No. 2017–17071900005 (HFC); NSF under No. OAC-1835598 (CINES) and HDR; Stanford Data Science Initiative, Chan Zuckerberg Biohub, Enlight Foundation, JD.com, Amazon, Boeing, Docomo, Huawei, Hitachi, Observe, Siemens, UST Global. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied, of DARPA, NIH, ONR, or the U.S. Government.

7. Appendix

7.1. Computing facilities

The G2SAT model is trained on a single NVIDIA RTX-2080Ti GPU. We evaluate the performance of SAT solvers on a cluster equipped with Intel Xeon E5–2637 v4 CPUs running Ubuntu 16.04 and we dedicated 2 cores, 8000 MB RAM for each job. For each formula, we gave each solver a 10 minutes timeout.

7.2. Evaluation

We used an implementation of the Louvain Algorithm to measure the modularities [5].

We used an implementation of the maximum likelihood method for computing an estimate of αv and αc [10, 1].

7.3. Removing Trivial Components of SAT formulas Generated by PS

We found that almost all formulas generated by the PS model were trivially unsatisfiable when it is tasked with generated formulas with similar metrics as formulas in the traning set. In order to better demonstrate the SAT solver behaviors on the PS model, we increased the difficulty of its generated formulas by performing a lightweight post-processing step that iteratively removes clauses leading to short unsatisfiable proof. Concretely, for each generated formula, we used a SAT solver (Picosat [3]) to try to solve it within a given conflict1 budget. If the solver were able to prove that the formula is unsatisfiable within that budget, we removed a clause from the unsatisfiable core (i.e., a subset of all clauses that is unsatisfiable) that Picosat returned and repeated the same process on the pruned formula, until it became satisfiable or no longer solvable within the conflict budget. We found that this efficient approach, only removes a low proportion of clauses and has small impact on the graph theoretic properties, but significantly increases the difficulty of the formulas generated by PS.

7.4. More details on baseline methods

Community Attachment (CA)

The CA model generates formulas to fit a desired VIG modularity [17]. The model takes in five inputs n, m, k, c, Q, where n is the number of variables, m the number of clauses, k the length of each clause, c the size of a partition of the VIG, and Q is the desired VIG modularity. The output of the algorithm is a SAT formula with n variables and m clauses, each of length k, such that the optimal modularity for any c-partition of the VIG of the formula is approximately Q.

Popularity-Similarity (PS)

The PS model generates formulas to fit desired αv and αc [18]. In addition, the formulas generated by PS are guaranteed to have high modularity. The model takes in seven inputs n, m, k, K, αv, αc, T, where n, m are the same with CA, k the minimum clause length, K the average clause length, and T a hyper-parameter that decides the trade-off between modularity and αv,αc. We use two versions of PS, with T = 0 and T = 1.5.

Footnotes

1

Link to code and datasets: http://snap.stanford.edu/g2sat/

2

Any SAT formula can be converted to an equisatisfiable CNF formula in linear time [41].

2

The solvers are, in order, MapleLCMDistChronoBT, Maple_LCM_Scavel_fix2, Maple_CM, Sparrow2Riss-2018, gluHack, glucose-3.0_PADC_10_NoDRUP [22].

1

A SAT solver finds a conflict when it finds a partial assignment of values to variables that cannot satisfy the formula.

References

  • [1].Ansótegui C, Bonet ML, and Levy J On the structure of industrial sat instances. In Principles and Practice of Constraint Programming, pages 127–141. Springer Berlin Heidelberg, 2009. [Google Scholar]
  • [2].Audemard G and Simon L Predicting learnt clauses quality in modern sat solvers. In Proceedings of the 21st International Joint Conference on Artifical Intelligence, pages 399–404, 2009. [Google Scholar]
  • [3].Biere A Picosat essentials. JSAT, 4:75–97, 2008. [Google Scholar]
  • [4].Biere A, Biere A, Heule M, van Maaren H, and Walsh T Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications. IOS Press, 2009. [Google Scholar]
  • [5].Blondel VD, Guillaume J-L, Lambiotte R, and Lefebvre E Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. [Google Scholar]
  • [6].Bojchevski A, Shchur O, Zügner D, and Günnemann S NetGAN: Generating graphs via random walks. In Proceedings of the 35th International Conference on Machine Learning, 2018. [Google Scholar]
  • [7].Boufkhad Y, Dubois O, Interian Y, and Selman B Regular random k-sat: Properties of balanced formulas. J. Autom. Reason, 35:181–200, 2005. [Google Scholar]
  • [8].Braunstein A, Mézard M, and Zecchina R Survey propagation: An algorithm for satisfiability. Random Struct. Algorithms, 27:201–226, 2005. [Google Scholar]
  • [9].Clarke E, Biere A, Raimi R, and Zhu Y Bounded model checking using satisfiability solving. Form. Methods Syst. Des, 19:7–34, 2001. [Google Scholar]
  • [10].Clauset A, Shalizi CR, and Newman MEJ Power-law distributions in empirical data. SIAM Review, 51:661–703, 2009. [Google Scholar]
  • [11].Cook SA The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, pages 151–158. ACM, 1971. [Google Scholar]
  • [12].De Cao N and Kipf T Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018. [Google Scholar]
  • [13].Eén N and Biere A Effective preprocessing in sat through variable and clause elimination. In International conference on theory and applications of satisfiability testing, pages 61–75. Springer, 2005. [Google Scholar]
  • [14].Eén N and Sörensson N An extensible sat-solver. In Theory and Applications of Satisfiability Testing, pages 502–518. Springer Berlin Heidelberg, 2004. [Google Scholar]
  • [15].Escamocher G, O’Sullivan B, and Prestwich SD Generating difficult sat instances by preventing triangles. CoRR, abs/1903.03592, 2019. [Google Scholar]
  • [16].Ganzinger H, Hagen G, Nieuwenhuis R, Oliveras A, and Tinelli C Dpll(t): Fast decision procedures. In Alur R and Peled DA, editors, Computer Aided Verification, pages 175–188. Springer Berlin Heidelberg, 2004. [Google Scholar]
  • [17].Giráldez-Cru J and Levy J A modularity-based random sat instances generator. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 1952–1958. AAAI Press, 2015. [Google Scholar]
  • [18].Giráldez-Cru J and Levy J Locality in random sat instances. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 638–644, 2017. [Google Scholar]
  • [19].Grover A, Zweig A, and Ermon S Graphite: Iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459, 2018. [Google Scholar]
  • [20].Hamilton W, Ying Z, and Leskovec J Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, 2017. [Google Scholar]
  • [21].Hamilton WL, Ying R, and Leskovec J Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 2017. [Google Scholar]
  • [22].Heule MJ, Järvisalo MJ, Suda M, et al. Proceedings of sat competition 2018. 2018. [Google Scholar]
  • [23].Hoos HH and Stützle T Satlib: An online resource for research on sat. Sat, 2000:283–292, 2000. [Google Scholar]
  • [24].Jin W, Barzilay R, and Jaakkola T Junction tree variational autoencoder for molecular graph generation. International Conference on Machine Learning, 2018. [Google Scholar]
  • [25].Katsirelos G and Simon L Eigenvector centrality in industrial sat instances. In Principles and Practice of Constraint Programming, pages 348–356. Springer Berlin Heidelberg, 2012. [Google Scholar]
  • [26].Kautz H and Selman B Planning as satisfiability. In Proceedings of the 10th European Conference on Artificial Intelligence, pages 359–363. John Wiley & Sons, Inc., 1992. [Google Scholar]
  • [27].Kingma DP and Ba J Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [Google Scholar]
  • [28].Kipf TN and Welling M Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016. [Google Scholar]
  • [29].Kipf TN and Welling M Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2017. [Google Scholar]
  • [30].Lauria M, Elffers J, Nordström J, and Vinyals M Cnfgen: A generator of crafted benchmarks. In Gaspers S and Walsh T, editors, Theory and Applications of Satisfiability Testing, pages 464–473. Springer International Publishing, 2017. [Google Scholar]
  • [31].Li Y, Vinyals O, Dyer C, Pascanu R, and Battaglia P Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018. [Google Scholar]
  • [32].Marques-Silva J, Lynce I, and Malik S Conflict-driven clause learning sat solvers. In Handbook of Satisfiability, 2009. [Google Scholar]
  • [33].Mull N, Fremont DJ, and Seshia SA On the hardness of sat with community structure. In Creignou N and Le Berre D, editors, Theory and Applications of Satisfiability Testing, pages 141–159. Springer International Publishing, 2016. [Google Scholar]
  • [34].Newman ME Clustering and preferential attachment in growing networks. Physical review E, 64(2), 2001. [DOI] [PubMed] [Google Scholar]
  • [35].Newman ME Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23):8577–8582, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Newsham Z, Ganesh V, Fischmeister S, Audemard G, and Simon L Impact of community structure on sat solver performance. In Theory and Applications of Satisfiability Testing, pages 252–268. Springer International Publishing, 2014. [Google Scholar]
  • [37].Selman B, Kautz H, and Cohen B Local search strategies for satisfiability testing. Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, 26, 09 1999. [Google Scholar]
  • [38].Selman B, Kautz H, and McAllester D Ten challenges in propositional reasoning and search. In Proceedings of the 15th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’97, pages 50–54, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. [Google Scholar]
  • [39].Selman B, Mitchell DG, and Levesque HJ Generating hard satisfiability problems. Artif. Intell, 81:17–29, 1996. [Google Scholar]
  • [40].Selsam D, Lamm M, Bünz B, Liang P, de Moura L, and Dill DL Learning a sat solver from single-bit supervision. arXiv preprint arXiv:1802.03685, 2018. [Google Scholar]
  • [41].TSEITIN GS On the complexity of derivation in propositional calculus. Structures in Constructive Mathematics and Mathematical Logic, pages 115–125, 1968. [Google Scholar]
  • [42].Valsesia D, Fracastoro G, and Magli E Learning localized generative models for 3d point clouds via graph convolution. International Conference on Learning Representations, 2019. [Google Scholar]
  • [43].Wu H and Ramanujan R Learning to generate industrial sat instances. In Proceedings of the 12th International Symposium on Combinatorial Search, pages 17–29. AAAI Press, 2019. [Google Scholar]
  • [44].Xu K, Boussemart F, Hemery F, and Lecoutre C A simple model to generate hard satisfiable instances. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005. [Google Scholar]
  • [45].You J, Liu B, Ying R, Pande V, and Leskovec J Graph convolutional policy network for goal-directed molecular graph generation. Advances in Neural Information Processing Systems, 2018. [Google Scholar]
  • [46].You J, Ying R, and Leskovec J Position-aware graph neural networks. International Conference on Machine Learning, 2019. [Google Scholar]
  • [47].You J, Ying R, Ren X, Hamilton W, and Leskovec J GraphRNN: Generating realistic graphs with deep auto-regressive models. In International Conference on Machine Learning, 2018. [Google Scholar]

RESOURCES