On the Sparsity of XORs in Approximate Model Counting

Durgesh Agrawal; Bhavishya; Kuldeep S Meel

doi:10.1007/978-3-030-51825-7_18

. 2020 Jun 26;12178:250–266. doi: 10.1007/978-3-030-51825-7_18

On the Sparsity of XORs in Approximate Model Counting

Durgesh Agrawal ¹⁰, Bhavishya ¹⁰, Kuldeep S Meel ^11,^✉

Editors: Luca Pulina⁸, Martina Seidl⁹

PMCID: PMC7326560

Abstract

Given a Boolean formula Inline graphic , the problem of model counting, also referred to as #SAT, is to compute the number of solutions of . The hashing-based techniques for approximate counting have emerged as a dominant approach, promising achievement of both scalability and rigorous theoretical guarantees. The standard construction of strongly 2-universal hash functions employs dense XORs (i.e., involving half of the variables in expectation), which is widely known to cause degradation in the runtime performance of state of the art Inline graphic solvers. Consequently, the past few years have witnessed an intense activity in the design of sparse XORs as hash functions. Such constructions have been proposed with beliefs to provide runtime performance improvement along with theoretical guarantees similar to that of dense XORs.

The primary contribution of this paper is a rigorous theoretical and empirical analysis to understand the effect of the sparsity of XORs. In contradiction to prior beliefs of applicability of analysis for sparse hash functions to all the hashing-based techniques, we prove a contradictory result. We show that the best-known bounds obtained for sparse XORs are still too weak to yield theoretical guarantees for a large class of hashing-based techniques, including the state of the art approach Inline graphic . We then turn to a rigorous empirical analysis of the performance benefits of sparse hash functions. To this end, we first design, to the best of our knowledge, the most efficient algorithm called using sparse hash functions, which achieves at least up to two orders of magnitude performance improvement over its predecessor. Contradicting the current beliefs, we observe that Inline graphic still falls short of in runtime performance despite the usage of dense XORs in . In conclusion, our work showcases that the question of whether it is possible to use short XORs to achieve scalability while providing strong theoretical guarantees is still wide open.

Background and Introduction

Given a Boolean formula Inline graphic , the problem of model counting, also referred to as #SAT, is to compute the number of solutions of . Model counting is a fundamental problem in computer science with a wide range of applications ranging from quantified information flow, reliability of networks, probabilistic programming, Bayesian networks, and others [4, 5, 10, 16, 21–23].

Given the computational intractability of #SAT, attention has been focused on the approximation of #SAT [28, 30]. In a breakthrough result, Stockmeyer provided a hashing-based randomized approximation scheme for counting that makes polynomially many invocations of an Inline graphic oracle [27]. The procedure, however, was computationally prohibitive in practice at that time, and no practical tools existed based on Stockmeyer’s proposed algorithmic framework until the early 2000s [16]. Motivated by the success of solvers, there has been a surge of interest in the design of hashing-based techniques for approximate model counting in the past decade [8, 9, 13, 15, 24, 25].

The core idea of the hashing-based framework is to employ pairwise independent hash functions1 to partition the solution space into roughly equal-sized small cells, wherein a cell is called small if it has solutions less than or equal to a pre-computed threshold, denoted by Inline graphic . A solver is employed to check if a cell is small by enumerating solutions one-by-one until either there are no more solutions or we have already enumerated solutions. The current state of the art techniques can be broadly classified into two categories:

The first category of techniques, henceforth called Cat1, consists of techniques that compute a constant factor approximation by setting to a constant and use Stockmeyer’s technique of constructing multiple copies of the input formula. [1, 2, 12, 29, 31]
The second class of techniques, henceforth called Cat2, consists of techniques that directly compute an -estimate by setting to , and hence invoking the underlying oracle times [7–9, 20, 21, 24, 25].

The current state of the art technique, measured by runtime performance, is Inline graphic , which falls into the class of Cat2 techniques [25]. The proofs of correctness for all the hashing-based techniques involve the use of concentration bounds due to pairwise independent hash functions.

The standard construction of pairwise independent hash functions employed in these techniques can be expressed as a conjunction of XOR constraints such that every variable is chosen with probability Inline graphic for each XORs. As such, each XOR contains, on an average, n/2 variables. A solver is invoked to enumerate solutions of the formula in conjunction with these XOR constraints. The performance of solvers, however, degrades with an increase in the size of XORs [15]. Therefore recent efforts have focused on the design of hash functions where each variable is chosen with probability Inline graphic [1, 2, 11, 14, 17]. We refer to the XOR constructed with as dense XORs while those constructed with as sparse XORs. In particular, given a hash function, h and cell , the random variable of interest, denoted by is the number of solutions of that h maps to cell . The pairwise independence of dense XORs is known to bound the variance of Inline graphic by the expectation of , which is sufficient for their usage for both Cat1 and Cat2 techniques.

In a significant result, Asteris and Dimakis [3], and Zhao et al. [31] showed that Inline graphic asymptotically suffices for Cat1 techniques. It is worth pointing that provides weaker guarantees on the variance of as compared to the case when . However, Zhao et al. showed that the weaker guarantees are sufficient for Cat1 techniques with only polynomial overhead on the time complexity. Furthermore, Zhao et al. provided necessary and sufficient conditions on the required asymptotic value of f and proposed a new algorithm Inline graphic that uses the proposed family of hash functions. One would expect that the result of Zhao et al. would settle the quest for efficient hash functions. However, upon closer examination, few questions have been left unanswered in Zhao et al.’s work and subsequent follow-up studies [1, 9, 21].

Can the hash function constructed by Zhao et al. be used for Cat2 techniques, in particular for state of the art hashing-based techniques like ?
In practice, can the overhead due to the weakness of theoretical guarantees of sparse XORs proposed by Zhao et al. be compensated by the gain of performance due to sparse XORs in the runtime of ?
Is the runtime performance of competitive to that of ? The reader may observe that Zhao et al.’s paper does not compare their proposed algorithm for -guarantees, called , with state of the art algorithms at that time such as , which is now in its third version, [25]. Therefore the question of whether the proposed sparse XORs are efficient in runtime was not settled. It is perhaps worth remarking that another line of work based on the construction of sparse XORs using low-density parity codes is known to introduce significant slowdown [1, 2] (See Section 9 of [1]).

The primary contribution of this paper is a rigorous theoretical and empirical analysis to understand the effect of sparse XORs for approximate model counters. In particular, we make the following key contributions:

We prove that the bounds obtained by Zhao et al., which are the strongest known bounds at this point, for the variance of , are still too weak for the analysis of . To the best of our knowledge, this is the first time the need for stronger bounds in the context of Cat2 techniques has been identified.
Since the weakness of bounds prevents usage of sparse hash functions in , we design the most efficient algorithm, to the best of our knowledge, using sparse hash functions. To this end, we propose an improvement of , called , that reduces the number of calls from linear to logarithmic and significantly improves the runtime performance of . The improvement from linear to logarithmic uses the idea of prefix-slicing introduced by Chakraborty, Meel, and Vardi [9] for .
We next present a rigorous empirical study involving a benchmark suite totaling over 1800 instances of runtime performance of vis-a-vis the state of the art approximate counting technique, . Surprisingly and contrary to current beliefs, we discover that , which uses dense XORs significantly outperforms for every benchmark. It is worth remarking that both and use identical solver for underlying calls and similar to other hashing-based techniques, over 99% for each of the algorithms is indeed consumed by the underlying solver.

Given the surprising nature of our results, few words are in order. First of all, our work identifies the tradeoffs involved in the usage of sparse hash functions and demonstrates that the variance bounds offered by sparse hash functions are too weak to be employed in the state of the art techniques. Secondly, our work demonstrates that the weakness of variance bounds leads to such a large overhead that the algorithms using sparse hash functions scale much worse compared to the algorithms without sparse XORs. Thirdly and finally, we believe the negative results showcase that the question of the usage of sparse XORs to achieve scalability while providing strong theoretical guarantees is still wide open. In an upcoming work, Meel Inline graphic Akshay2 [20] define a new family of hash functions, called concentrated hashing, and provide a new construction of sparse hash functions belonging to concentrated hashing, and design a new algorithmic framework on top of , which is shown to achieve runtime improvements.

The rest of the paper is organized as follows. We discuss notations and preliminaries in Sect. 2. We then discuss the weakness of guarantees offered by sparse XORs in Sect. 3. In Sect. 4, we seek to design an efficient algorithm that utilizes all the advancements, to the best of our knowledge, in approximate model counting community. We present a rigorous empirical study comparing performance of Inline graphic , , and in Sect. 5 and conclude in Sect. 6.

Preliminaries and Notations

Let Inline graphic be a Boolean formula in conjunctive normal form (CNF), and let be the set of variables appearing in . The set is also called the support of . Unless otherwise stated, we will use n to denote the number of variables in i.e., . An assignment of truth values to the variables in is called a satisfying assignment or witness of Inline graphic if it makes evaluate to true. We denote the set of all witnesses of by .

We write Inline graphic to denote the probability of outcome . The expected value of is denoted and its variance is denoted .

The propositional model counting problem is to compute Inline graphic for a given CNF formula . A probably approximately correct () counter is a probabilistic algorithm that takes as inputs a formula F, a tolerance , and a confidence parameter , and returns a count c with -guarantees, i.e., .

In this work, we employ a family of universal hash functions. Let Inline graphic be a family of hash functions mapping to . We use to denote the probability space obtained by choosing a function h uniformly at random from H.

In this work, we will use the concept of prefix-slicing introduced by Chakraborty et al. [9]. For Inline graphic , formally, for every , the prefix-slice of h, denoted , is a map from to , such that , for all and for all . Similarly, the prefix-slice of , denoted , is an element of such that for all . The randomness in the choices of h and induce randomness in the choices of and . However, the Inline graphic pairs chosen for different values of j are no longer independent. Specifically, and for and for all .

For a formula Inline graphic , , and , we define , i.e. the number of solutions of mapped to by . For the sake of notational clarity, whenever and are clear from the context, we will use as a shorthand for

Definition 1

[6] A family of hash functions H(n, m) is pairwise independent (also known as strongly 2-universal) if Inline graphic , distinct , , we have .

Definition 2

Let Inline graphic be a random matrix whose entries are Bernoulli i.i.d. random variables such that for all . Let be chosen uniformly at random, independently from A. Let and = : , where is chosen randomly according to this process. Then, is defined as hash family with -sparsity.

Since we can represent hash functions in Inline graphic using a set of XORs; we will use dense XORs to refer to hash functions with for all i while we use sparse XORs to refer to hash functions with for some i. Note that is the standard pairwise independent hash family, also denoted as in earlier works [21].

Definition 3

[11] Let Inline graphic and . Let Z be a random variable with . Then Z is strongly -concentrated if and weakly -concentrated if both ] and Pr[Z ] .

Related Work

Gomes et al. [14] first identified the improvements in solving time due to the usage of sparse XORs in approximate model counting algorithms. The question of whether sparse XORs can provide the required theoretical guarantees was left open. A significant progress in this direction was achieved by Ermon et al. [11], who provided the first rigorous analysis of the usage of sparse XOR constraints. Building on Ermon et al., Zhao et al. [31] and Asteris and Dimakis [3] independently provided further improved analysis of Ermon et al. and showed that probability Inline graphic suffices to provide constant factor approximation, which can be amplified to approximation.

While the above mentioned efforts focused on each entry of A to be i.i.d., Achlioptas and Theodorpoulos [2], Achlioptas, Hammoudeh, and Theodorpoulos [1] investigated the design of hash functions where A is a structured matrix by drawing on connections to the error correcting codes. While their techniques provide a construction of sparse constraints, the constants involved in asymptotics lead to impractical algorithms for Inline graphic guarantees (See Sect. 9 of [1]). The work of Achlioptas et al. demonstrates the promise and limitations of structured random matrices in the design of hashing-based algorithms; however, there is no such study in the case when all the entries are i.i.d. In this paper, we theoretically improve the construction proposed by Asteris and Dimakis [3], and Zhao et al. [31] and perform a rigorous empirical study to understand the tradeoffs of sparsity.

Weakness of Guarantees Offered by Sparse XORs

In this section, we present the first contribution of this paper: demonstration of the weakness of theoretical guarantees obtained in prior work [3, 11, 31] for sparse XORs. To this end, we investigate whether the bounds offered by Zhao et al. on the variance of Inline graphic , which are the strongest bounds known on sparse XORs, can be employed in the analysis of Cat2 techniques. For clarity of exposition, we focus on the usage of sparse XOR bounds in , but our conclusions extend to other Cat2 techniques, as pointed out below.

The analysis of Inline graphic employs the bounds on the variance of using the following standard concentration bounds.

Lemma 1

For every Inline graphic ,, , we have:

Chebyshev Inequality
Paley-Zygmund Inequality

The analysis of Cat2 techniques (and Inline graphic in particular) bounds the failure probability of the underlying algorithm by upper bounding the above expressions for appropriately chosen values of i. To obtain meaningful upper bounds, these techniques employ the inequality obtained via the usage of 2-universal hash functions3.

Recall, that the core idea of the hashing-based framework is to employ 2-universal hash functions to partition the solution space into roughly equal sized small cells, wherein a cell is called small if it has solutions less than or equal to a pre-computed threshold, denoted by Inline graphic , which is chosen as . To this end, the analysis lower bounds by , which allows the denominator to be lower bounded by a constant. Given that can be set to for some , we can relax the requirement on the chosen hash family to ensuring for some . Note that pairwise independent hash functions based on dense XORs ensure Inline graphic (i.e., ).

We now investigate the guarantees provided by sparse XORs. To this end, we first recall the following result, which follows from combining Theorem 1 and Theorem 3 of [11].

Lemma 2

[11]4 For Inline graphic , let

For Inline graphic , we have:

Zhao et al. [31], building on Ermon et al. [11], obtain the following bound (see, Lemma 8 and Lemma 10 of [31]).

Lemma 3

[31] Define Inline graphic . Then for .

The bound on Inline graphic from Zhao et al. can be stated as:

Theorem 1

Inline graphic where .

Proof

(Substituting Inline graphic

Substituting Inline graphic , we have:

Using Corollary 3, we have Inline graphic .

Recall, the analysis of Inline graphic requires us to upper bound by for . Since the best-known bounds on lower bound by , these bounds are not sufficient to be used by . At this point, one may wonder as to what is the key algorithmic difference between Cat1 and Cat2 that necessitates the use of stronger bounds: Cat1 techniques compute a constant factor approximation and then make use of Stockmeyer’s argument to lift a constant factor approximation to Inline graphic -approximation, whereas, Cat2 techniques directly compute a -approximation, which necessitates the usage of stronger concentration bounds.

: An Efficient Algorithm for Sparse XORs

The inability of sparse XORs to provide good enough bounds on variance for usage in Cat2 techniques, in particular Inline graphic , leads us to ask: how do we design the most efficient algorithm for approximate model counting making use of the existing gadgets in the model counting literature. We recall that Zhao et al. [31] provided matching necessary and sufficient conditions on the required asymptotic density of matrix A. Furthermore, they proposed a hashing-based algorithm, Inline graphic , that utilizes sparser constraints.

As mentioned earlier, Chakraborty et al. [9] proposed the technique of using prefix-slicing of hash functions in the context of hashing-based techniques and their empirical evaluation demonstrated significant theoretical and empirical improvements owing to the usage of prefix hashing. In this work, we first show a dramatic reduction in the complexity of Inline graphic by utilizing the concept of prefix-slicing and thereby improving the number of calls from to for fixed and The modified algorithm, called , significantly outperforms , as demonstrated in Sect. 5.

Algorithm 1 shows the pseudo-code for Inline graphic . assumes access to oracle that takes in a formula and returns YES if is satisfiable, otherwise it returns NO. Furthermore, assumes access to the subroutine that creates multiple copies of a given formula, a standard technique first proposed by Stockmeyer [27] to lift a constant factor approximation to that of Inline graphic -factor approximation for arbitrary . Similar to Algorithm 1 of [11], we choose in line 5, such that the resulting hash functions guarantee weak -concentration for the random variable for all i, where . shares similarity with with the core difference in the replacement of linear search in Inline graphic with the procedure . shares similarity with the procedure of Chakraborty et al. [9]. The subroutine employs prefix search, which ensures that for all i, . The monotonicity of allows us to perform a binary search to find the value of i for which and . Consequently, we make calls to the underlying Inline graphic oracle during each invocation of instead of calls in case of . Note that is invoked T times, where ( as defined in the algorithm) and the returned value is added to the list . We then return the median of .

It is worth noting that Inline graphic and differ only in the usage of , which is set to 1 for and a function of for , as observed in the discussion following Lemma 1. The usage of dependent on requires stronger bounds on variance, which can not be provided by sparse XORs as discussed in the previous section.

Analysis of Correctness of

We now present the theoretical analysis of Inline graphic . It is worth asserting that the proof structure and technique for and are significantly different, as is evident from the inability of to use sparse XORs. Therefore, while the algorithmic change might look minor, the proof of correctness requires a different analysis.

Theorem 2

Let Inline graphic employ hash families, where is chosen such that it guarantees weak -concentration for the random variable for all i, then returns count c such that

Proof

Similar to [31], we assume that Inline graphic is a power of 2; a relaxation of the assumption simply introduces a constant factor in the approximation. Let and for we define the variable to denote the value of when iter = t. Let . Note that the choice of ensures that is weakly concentrated.

Let Inline graphic denote the event that or . We denote the event as and the event as . Note that . We now compute and as follows:

From Algorithm 1, we have Median. For , we have that for at least iterations of returns . For t-th invocation of (i.e., ) to return , then it is necessarily the case that . Since is chosen such that the resulting hash function guarantees -concentration for the random variable , we have for . Let us denote, by , the event that at least for of we have . Therefore, by Chernoff bound we have where . By applying union bound, we have
Again, from the Algorithm 1, we have Median. Therefore, for , we have at least invocations of return . For t-th invocation of (i.e., ) to return m, then it is necessarily the case that . Noting, . For , we have for
Let us denote by , the event that for at least of values, we have . By Chernoff bound for , we have where . By applying union bound, we have .

Therefore, we have Inline graphic . Substituting T, we have

Now notice that Inline graphic ; Therefore, ensures that we have . Therefore,

Empirical Studies

We focus on empirical study for comparison of runtime performance of Inline graphic , , and . All the three algorithms, , , and , are implemented in C++ and use the same underlying solver, [26] augmented with the BIRD framework introduced in [24, 25]. augmented with BIRD is state of the art solver equipped to handle XOR constraints natively. It is worth noting that for hashing-based techniques, over 99% of the runtime is consumed by the underlying Inline graphic solver [25]. Therefore, the difference in the performance of the algorithms is primarily due to the number of calls and the formulas over which the solver is invoked. Furthermore, our empirical conclusions do not change even using the older versions of .

We conducted experiments on a wide variety of publicly available benchmarks. Our benchmark suite consists of 1896 formulas arising from probabilistic inference in grid networks, synthetic grid structured random interaction Ising models, plan recognition, DQMR networks, bit-blasted versions of SMTLIB benchmarks, ISCAS89 combinational circuits, and program synthesis examples. Every experiment consisted of running a counting algorithm on a particular instance with a timeout of 4500 s. The experiments were conducted on a high-performance cluster, where each node consists of E5-2690 v3 CPU with 24 cores and 96GB of RAM. We set Inline graphic and for all the tools.

The objective of our empirical study was to seek answers to the following questions:

How does compare against in terms of runtime performance?
How does perform against in terms of runtime?

Overall, we observe that Inline graphic significantly outperforms . On the other hand, outperforms with a mean speedup of .

Our conclusions are surprising and stand in stark contrast to the widely held belief that the current construction of sparse XORs by Zhao et al. [31] and Ermon et al. [11] lead to runtime improvement [1, 18, 19].

Figure 1 shows the cactus plot for Inline graphic , , and . We present the number of benchmarks on axis and the time taken on axis. A point (x, y) implies that x benchmarks took less than or equal to y seconds for the corresponding tool to execute. We present a runtime comparison of vis-a-vis and in Table 1. Column 1 of this table gives the benchmark name while column 2 and 3 list the number of variables and clauses, respectively. Column 4, 5, and 6 list the runtime (in seconds) of Inline graphic , and , respectively. Note that “TO” stands for timeout. For lack of space, we present results only on a subset of benchmarks. The detailed logs along with list of benchmarks and the binaries employed to run the experiments are available at http://doi.org/10.5281/zenodo.3792748

Fig. 1. — Cactus plot of runtime performance (best viewed in color)

Table 1.

Table of comparison between Inline graphic , , and

Benchmark (.cnf)	Vars	Clauses	Time (s)
Benchmark (.cnf)	Vars	Clauses
blasted_case200	14	42	18.13	8.49	0.01
blasted_case60	15	35	350.48	23.43	0.01
s27_3_2	20	43	1581.63	30.28	0.01
SetTest.sk_9_21	33744	148948	1679.62	171.02	0.81
lss.sk_6_7	82362	259552	1959.39	405.61	1.63
registerlesSwap.sk_3_10	372	1493	2498.02	60.23	0.03
polynomial.sk_7_25	313	1027	2896.49	99.94	0.02
02A-3	5488	21477	3576.82	467.26	0.06
blasted_case24	65	190	TO	125.25	0.05
ConcreteActivityService.sk_13_28	2481	9011	TO	467.97	0.84
GuidanceService2.sk_2_27	715	2181	TO	498.14	0.29
ActivityService2.sk_10_27	1952	6867	TO	505.23	0.5
UserServiceImpl.sk_8_32	1509	5009	TO	511.09	0.33
or-100-10-4-UC-60	200	500	TO	608.86	0.05
02A-2	3857	15028	TO	1063.67	0.05
LoginService2.sk_23_36	11511	41411	TO	1127.36	2.96
17.sk_3_45	10090	27056	TO	1299.15	1.69
diagStencil.sk_35_36	319730	1774184	TO	2188.19	112.52
tableBasedAddition.sk_240_1024	1026	961	TO	TO	2.17
blasted_squaring9	1434	5028	TO	TO	5.04
blasted_TR_b12_1_linear	1914	6619	TO	TO	259.3

Open in a new tab

We present relative comparisons separately for ease of exposition and clarity.

vis-a-vis

As shown in Fig. 1, with a timeout of 4500 s, Inline graphic could only finish execution on 90 benchmarks while completed on 379 benchmarks. Note that retains the same theoretical guarantees of .

For a clear picture of performance gain achieved by Inline graphic over , we turn to Table 1. Table 1 clearly demonstrates that outperforms significantly. In particular, for all the benchmarks where both and did not timeout, the mean speedup is 10.94.

Explanation. The stark difference in the performance of Inline graphic and is primarily due to a significant reduction in the number of calls in . Recall, invokes the underlying solver times while invokes the underlying solver only times. As discussed above, such a difference was achieved due to the usage of prefix-slices.

vis-a-vis

With a timeout of 4500 s, Inline graphic could only finish execution on 379 benchmarks while finishes execution on 1169 benchmarks. Furthermore, Table 1 clearly demonstrates that significantly outperforms . In particular, for all the formulas where both and did not timeout, the mean speedup is 568.53. In light of recent improvements in Inline graphic , one may wonder if the observations reported in this paper are mere artifacts of how the solvers have changed in the past few years and perhaps such a study on an earlier version of may have led to a different conclusion. To account for this threat of validity, we conducted preliminary experiments using the old versions of Inline graphic and again observed that similar observations hold. In particular, the latest improvements in such as BIRD framework [24, 25] favor and relatively in comparison to .

Explanation. The primary contributing factor for the difference in the runtime performance of Inline graphic and is the fact that weaker guarantees for the variance of necessitates the usage of Stockmeyer’s trick of usage of the amplification technique wherein the underlying routines are invoked over instead of . Furthermore, the weak theoretical guarantees also lead to a larger value of T as compared to its analogous parameter in Inline graphic . It is worth noticing that prior work on the design of sparse hash function has claimed that the usage of sparse hash functions leads to runtime performance improvement of the underlying techniques. Such inference may perhaps be drawn based only on observing the time taken by a solver on CNF formula with a fixed number of XORs and only varying the density of XORs. While such an observation does indeed highlight that sparse XORs are easy for Inline graphic solvers, but it fails, as has been the case in prior work, to take into account the tradeoffs due to the weakness of theoretical guarantees of sparse hash functions. To emphasize this further, the best known theoretical guarantees offered by sparse XORs are so weak that one can not merely replace the dense XORs with sparse XORs. The state of the art counters such as Inline graphic require stronger guarantees than those known today.

Conclusion

Hashing-based techniques have emerged as a promising paradigm to attain scalability and rigorous guarantees in the context of approximate model counting. Since the performance of Inline graphic solvers was observed to degrade with an increase in the size of XORs, efforts have focused on the design of sparse XORs. In this paper, we performed the first rigorous analysis to understand the theoretical and empirical effect of sparse XORs. Our conclusions are surprising and stand in stark contrast to the widely held belief that the current construction of sparse XORs by Zhao et al. [31] and Ermon et al. [11] lead to runtime improvement. We demonstrate that the theoretical guarantees offered by the construction as mentioned earlier are still too weak to be employed in the state of the art approximate counters such as Inline graphic . Furthermore, the most efficient algorithm using sparse XORs, to the best of our knowledge, still falls significantly short of in runtime performance. While our analysis leads to negative results for the current state of the art sparse construction of hash functions, we hope our work would motivate other researchers in the community to investigate the construction of efficient hash functions rigorously. In this spirit, concurrent work of Meel Inline graphic Akshay [20] proposes a new family of hash functions called concentrated hash functions, and design a new family of sparse hash functions of the form wherein every entry of A[i] is chosen with probability . Meel Akshay propose an adaption of that can make use of the newly designed sparse hash functions, and in turn, obtain promising speedups on a subset of benchmarks.

Acknowledgments

The authors would like to sincerely thank the anonymous reviewers of AAAI-20 and SAT-20 (and in particular, Meta-Reviewer of AAAI-20) for providing detailed, constructive criticism that has significantly improved the quality of the paper. We are grateful to Mate Soos for many insightful discussions.

This work was supported in part by National Research Foundation Singapore under its NRF Fellowship Programme [NRF-NRFFAI1-2019-0004] and AI Singapore Programme [AISG-RP-2018-005], and NUS ODPRT Grant [R-252-000-685-13]. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore https://www.nscc.sg.

Footnotes

Pairwise independent hash functions were initially referred to as strongly 2-universal hash functions in [6]. The prior work on approximate counting often uses the term 2-universal hashing to refer to strongly 2-universal hash functions.

Inline graphic is used to denote random author ordering, as suggested by the authors.

While we are focusing on Inline graphic , the requirement of holds for other Cat2 techniques.

⁴

The expression stated in the Theorem can be found in the revised version at https://cs.stanford.edu/~ermon/papers/SparseHashing-revised.pdf (Accessed: May 10, 2020).

The author list has been sorted alphabetically by last name; this order should not be used to determine the extent of authors’ contributions. Part of the work was carried out during the first two authors’ internships at National University of Singapore.

Contributor Information

Luca Pulina, Email: lpulina@uniss.it.

Martina Seidl, Email: martina.seidl@jku.at.

Kuldeep S. Meel, Email: meel@comp.nus.edu.sg

References

1.Achlioptas, D., Hammoudeh, Z., Theodoropoulos, P.: Fast and flexible probabilistic model counting. In: Beyersdorff, O., Wintersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 148–164. Springer, Cham (2018). 10.1007/978-3-319-94144-8_10
2.Achlioptas, D., Theodoropoulos, P.: Probabilistic model counting with short XORs. In: Gaspers, S., Walsh, T. (eds.) SAT 2017. LNCS, vol. 10491, pp. 3–19. Springer, Cham (2017). 10.1007/978-3-319-66263-3_1
3.Asteris, M., Dimakis, A.G.: LDPC codes for discrete integration. Technical report, UT Austin (2016)
4.Baluta, T., Shen, S., Shinde, S., Meel, K.S., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1249–1264 (2019)
5.Biondi, F., Enescu, M.A., Heuser, A., Legay, A., Meel, K.S., Quilbeuf, J.: Scalable approximation of quantitative information flow in programs. In: VMCAI 2018. LNCS, vol. 10747, pp. 71–93. Springer, Cham (2018). 10.1007/978-3-319-73721-8_4
6.Carter, J.L., Wegman, M.N.: Universal classes of hash functions. In: Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pp. 106–112. ACM (1977)
7.Chakraborty, S., Meel, K.S., Mistry, R., Vardi, M.Y.: Approximate probabilistic inference via word-level counting. In: Proceedings of AAAI (2016)
8.Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable approximate model counter. In: Proceedings of CP, pp. 200–216 (2013)
9.Chakraborty, S., Meel, K.S., Vardi, M.Y.: Algorithmic improvements in approximate counting for probabilistic inference: from linear to logarithmic SAT calls. In: Proceedings of IJCAI (2016)
10.Duenas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y.: Counting-based reliability estimation for power-transmission grids. In: Proceedings of AAAI, February 2017
11.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Low-density parity constraints for hashing-based discrete integration. In: Proceedings of ICML, pp. 271–279 (2014)
12.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Optimization with parity constraints: from binary codes to discrete integration. In: Proceedings of UAI (2013)
13.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Taming the curse of dimensionality: discrete integration by hashing and optimization. In: Proceedings of ICML, pp. 334–342 (2013)
14.Gomes CP, Hoffmann J, Sabharwal A, Selman B. Short XORs for model counting: from theory to practice. In: Marques-Silva J, Sakallah KA, editors. Theory and Applications of Satisfiability Testing – SAT 2007; Heidelberg: Springer; 2007. pp. 100–106. [Google Scholar]
15.Gomes, C.P., Sabharwal, A., Selman, B.: Model counting: a new strategy for obtaining good bounds. In: Proceedings of AAAI, vol. 21, pp. 54–61 (2006)
16.Gomes, C.P., Sabharwal, A., Selman, B.: Model counting. In: Biere, A., Heule, M., Maaren, H.V., Walsh, T. (eds.) Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, pp. 633–654. IOS Press (2009)
17.Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y.: On computing minimal independent support and its applications to sampling and counting. Constraints, 1–18 (2015). 10.1007/s10601-015-9204-z
18.Kuck, J., Dao, T., Zhao, S., Bartan, B., Sabharwal, A., Ermon, S.: Adaptive hashing for model counting. In: Conference on Uncertainty in Artificial Intelligence (2019)
19.Kuck, J., Sabharwal, A., Ermon, S.: Approximate inference via weighted Rademacher complexity. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
20.Meel, K.S., Akshay, S.: Sparse hashing for scalable approximate model counting: theory and practice. In: Proceedings of LICS (2020)
21.Meel, K.S., et al.: Constrained sampling and counting: universal hashing meets sat solving. In: Proceedings of Beyond NP Workshop (2016)
22.Roth D. On the hardness of approximate reasoning. Artif. Intell. 1996;82(1):273–302. doi: 10.1016/0004-3702(94)00092-1. [DOI] [Google Scholar]
23.Sang, T., Beame, P., Kautz, H.: Performing Bayesian inference by weighted model counting. In: Proceedings of AAAI, pp. 475–481 (2005)
24.Soos, M., Gocht, S., Meel, K.S.: Accelerating approximate techniques for counting and sampling models through refined CNF-XOR solving. In: Proceedings of International Conference on Computer-Aided Verification (CAV), July 2020
25.Soos, M., Meel, K.S.: Bird: engineering an efficient CNF-XOR Sat solver and its applications to approximate model counting. In: Proceedings of AAAI Conference on Artificial Intelligence (AAAI) (2019)
26.Soos M, Nohl K, Castelluccia C. Extending SAT solvers to cryptographic problems. In: Kullmann O, editor. Theory and Applications of Satisfiability Testing - SAT 2009; Heidelberg: Springer; 2009. pp. 244–257. [Google Scholar]
27.Stockmeyer, L.: The complexity of approximate counting. In: Proceedings of STOC, pp. 118–126 (1983)
28.Toda, S.: On the computational power of PP and (+)P. In: Proceedings of FOCS, pp. 514–519. IEEE (1989)
29.Trevisan, L.: Lecture notes on computational complexity. Notes written in Fall (2002). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.9877&rep=rep1&type=pdf
30.Valiant L. The complexity of enumeration and reliability problems. SIAM J. Comput. 1979;8(3):410–421. doi: 10.1137/0208032. [DOI] [Google Scholar]
31.Zhao, S., Chaturapruek, S., Sabharwal, A., Ermon, S.: Closing the gap between short and long XORs for model counting. In: Proceedings of AAAI (2016)

[CR1] 1.Achlioptas, D., Hammoudeh, Z., Theodoropoulos, P.: Fast and flexible probabilistic model counting. In: Beyersdorff, O., Wintersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 148–164. Springer, Cham (2018). 10.1007/978-3-319-94144-8_10

[CR2] 2.Achlioptas, D., Theodoropoulos, P.: Probabilistic model counting with short XORs. In: Gaspers, S., Walsh, T. (eds.) SAT 2017. LNCS, vol. 10491, pp. 3–19. Springer, Cham (2017). 10.1007/978-3-319-66263-3_1

[CR3] 3.Asteris, M., Dimakis, A.G.: LDPC codes for discrete integration. Technical report, UT Austin (2016)

[CR4] 4.Baluta, T., Shen, S., Shinde, S., Meel, K.S., Saxena, P.: Quantitative verification of neural networks and its security applications. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1249–1264 (2019)

[CR5] 5.Biondi, F., Enescu, M.A., Heuser, A., Legay, A., Meel, K.S., Quilbeuf, J.: Scalable approximation of quantitative information flow in programs. In: VMCAI 2018. LNCS, vol. 10747, pp. 71–93. Springer, Cham (2018). 10.1007/978-3-319-73721-8_4

[CR6] 6.Carter, J.L., Wegman, M.N.: Universal classes of hash functions. In: Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pp. 106–112. ACM (1977)

[CR7] 7.Chakraborty, S., Meel, K.S., Mistry, R., Vardi, M.Y.: Approximate probabilistic inference via word-level counting. In: Proceedings of AAAI (2016)

[CR8] 8.Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable approximate model counter. In: Proceedings of CP, pp. 200–216 (2013)

[CR9] 9.Chakraborty, S., Meel, K.S., Vardi, M.Y.: Algorithmic improvements in approximate counting for probabilistic inference: from linear to logarithmic SAT calls. In: Proceedings of IJCAI (2016)

[CR10] 10.Duenas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y.: Counting-based reliability estimation for power-transmission grids. In: Proceedings of AAAI, February 2017

[CR11] 11.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Low-density parity constraints for hashing-based discrete integration. In: Proceedings of ICML, pp. 271–279 (2014)

[CR12] 12.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Optimization with parity constraints: from binary codes to discrete integration. In: Proceedings of UAI (2013)

[CR13] 13.Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Taming the curse of dimensionality: discrete integration by hashing and optimization. In: Proceedings of ICML, pp. 334–342 (2013)

[CR14] 14.Gomes CP, Hoffmann J, Sabharwal A, Selman B. Short XORs for model counting: from theory to practice. In: Marques-Silva J, Sakallah KA, editors. Theory and Applications of Satisfiability Testing – SAT 2007; Heidelberg: Springer; 2007. pp. 100–106. [Google Scholar]

[CR15] 15.Gomes, C.P., Sabharwal, A., Selman, B.: Model counting: a new strategy for obtaining good bounds. In: Proceedings of AAAI, vol. 21, pp. 54–61 (2006)

[CR16] 16.Gomes, C.P., Sabharwal, A., Selman, B.: Model counting. In: Biere, A., Heule, M., Maaren, H.V., Walsh, T. (eds.) Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, pp. 633–654. IOS Press (2009)

[CR17] 17.Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y.: On computing minimal independent support and its applications to sampling and counting. Constraints, 1–18 (2015). 10.1007/s10601-015-9204-z

[CR18] 18.Kuck, J., Dao, T., Zhao, S., Bartan, B., Sabharwal, A., Ermon, S.: Adaptive hashing for model counting. In: Conference on Uncertainty in Artificial Intelligence (2019)

[CR19] 19.Kuck, J., Sabharwal, A., Ermon, S.: Approximate inference via weighted Rademacher complexity. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

[CR20] 20.Meel, K.S., Akshay, S.: Sparse hashing for scalable approximate model counting: theory and practice. In: Proceedings of LICS (2020)

[CR21] 21.Meel, K.S., et al.: Constrained sampling and counting: universal hashing meets sat solving. In: Proceedings of Beyond NP Workshop (2016)

[CR22] 22.Roth D. On the hardness of approximate reasoning. Artif. Intell. 1996;82(1):273–302. doi: 10.1016/0004-3702(94)00092-1. [DOI] [Google Scholar]

[CR23] 23.Sang, T., Beame, P., Kautz, H.: Performing Bayesian inference by weighted model counting. In: Proceedings of AAAI, pp. 475–481 (2005)

[CR24] 24.Soos, M., Gocht, S., Meel, K.S.: Accelerating approximate techniques for counting and sampling models through refined CNF-XOR solving. In: Proceedings of International Conference on Computer-Aided Verification (CAV), July 2020

[CR25] 25.Soos, M., Meel, K.S.: Bird: engineering an efficient CNF-XOR Sat solver and its applications to approximate model counting. In: Proceedings of AAAI Conference on Artificial Intelligence (AAAI) (2019)

[CR26] 26.Soos M, Nohl K, Castelluccia C. Extending SAT solvers to cryptographic problems. In: Kullmann O, editor. Theory and Applications of Satisfiability Testing - SAT 2009; Heidelberg: Springer; 2009. pp. 244–257. [Google Scholar]

[CR27] 27.Stockmeyer, L.: The complexity of approximate counting. In: Proceedings of STOC, pp. 118–126 (1983)

[CR28] 28.Toda, S.: On the computational power of PP and (+)P. In: Proceedings of FOCS, pp. 514–519. IEEE (1989)

[CR29] 29.Trevisan, L.: Lecture notes on computational complexity. Notes written in Fall (2002). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.9877&rep=rep1&type=pdf

[CR30] 30.Valiant L. The complexity of enumeration and reliability problems. SIAM J. Comput. 1979;8(3):410–421. doi: 10.1137/0208032. [DOI] [Google Scholar]

[CR31] 31.Zhao, S., Chaturapruek, S., Sabharwal, A., Ermon, S.: Closing the gap between short and long XORs for model counting. In: Proceedings of AAAI (2016)

PERMALINK

On the Sparsity of XORs in Approximate Model Counting

Durgesh Agrawal

Bhavishya

Kuldeep S Meel

Abstract

Background and Introduction

Preliminaries and Notations

Definition 1

Definition 2

Definition 3

Related Work

Weakness of Guarantees Offered by Sparse XORs

Lemma 1

Lemma 2

Lemma 3

Theorem 1

Proof

: An Efficient Algorithm for Sparse XORs

Analysis of Correctness of

Theorem 2

Proof

Empirical Studies

Fig. 1.

Table 1.

vis-a-vis

vis-a-vis

Conclusion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On the Sparsity of XORs in Approximate Model Counting

Durgesh Agrawal

Bhavishya

Kuldeep S Meel

Abstract

Background and Introduction

Preliminaries and Notations

Definition 1

Definition 2

Definition 3

Related Work

Weakness of Guarantees Offered by Sparse XORs

Lemma 1

Lemma 2

Lemma 3

Theorem 1

Proof

: An Efficient Algorithm for Sparse XORs

Analysis of Correctness of

Theorem 2

Proof

Empirical Studies

Fig. 1.

Table 1.

vis-a-vis

vis-a-vis

Conclusion

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases