#### DMCA

## Optimization With Parity Constraints: From Binary Codes to Discrete Integration

Citations: | 5 - 3 self |

### Citations

1790 | Factor graphs and the sum-product algorithm
- Kschischang, Frey, et al.
- 2001
(Show Context)
Citation Context ...tive Message Passing (MP) methods are among the most widely used decoding techniques. Although the decoding problem is computationally intractable, they usually have very good performance in practice =-=[7, 19]-=-. Since we can represent parity constraints as additional factors in our original factor graph model, MPtechniquescanalsobeheuristicallyapplied to solve the more general MAP inference queries with par... |

1524 |
Local computations with probabilities on graphical structures and their applications to expert systems (with discussion),
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...bound, and the Mean Field approach [31] which gives a provable lower bound. We use the implementations in the LibDAI library [22] and compare with ground truth obtained using the Junction Tree method =-=[20]-=-. Figure 4 shows the error in the resulting estimates, together with the upper and lower bounds obtained with WISH augmented with Toeplitz-matrix hashing and CPLEX. We immediately see that our lower b... |

915 |
Probabilistic Graphical Models: Principles and Techniques
- Koller, Friedman
- 2009
(Show Context)
Citation Context ... and upper bounds on the partition function, which hold with high probability and are much tighter than those obtained with variational methods. 1 INTRODUCTION Discrete probabilistic graphical models =-=[18, 31]-=- are often defined up to a normalization factor involving a summation over an exponentially large combinatorial space. Computing these factors is an important problem, as they are needed, for instance... |

847 |
M.N.: Universal classes of hash functions.
- Carter, Wegman
- 1979
(Show Context)
Citation Context ...inary variables. 3 BACKGROUND This paper extends previous work by Ermon et al. [6] who introduced an algorithm called WISH to estimate the partition function (2). WISH is a randomized approximation algorithm that gives a constant factor approximation of Z with high probability. It involves solving a polynomial number of MAP inference queries for the graphical model conditioned on randomly generated evidence based on universal hashing. 3.1 FAMILIES OF HASH FUNCTIONS A key ingredient of the WISH algorithm is the concept of pairwise independent hashing, originally introduced by Carter and Wegman [5] and later recognized as a tool that “should belong to the bag of tricks of every computer scientist” [33]. There are several indepth expositions of the topic [cf. 12, 27, 28]. Here we will also make use of a weaker notion of hashing, called uniform hashing and defined as follows: Definition 1. A family of functions H = {h : {0, 1}n → {0, 1}m} is called uniform if for H ∈R H it holds that ∀x ∈ {0, 1}n, the random variable H(x) is uniformly distributed in {0, 1}m. Here we use the notation H ∈R H to denote H being chosen uniformly at random from H. Definition 2. A family of functions H = {h : {0... |

819 | Graphical models, exponential families, and variational inference. Foundations and Trends R©
- Wainwright, Jordan
- 2008
(Show Context)
Citation Context ...nnection with max-likelihood decoding of binary codes, we show that these optimizations are computationally hard. Inspired by iterative message passing decoding algorithms, we propose an Integer Linear Programming (ILP) formulation for the problem, enhanced with new sparsification techniques to improve decoding performance. By solving the ILP through a sequence of LP relaxations, we get both lower and upper bounds on the partition function, which hold with high probability and are much tighter than those obtained with variational methods. 1 INTRODUCTION Discrete probabilistic graphical models [18, 31] are often defined up to a normalization factor involving a summation over an exponentially large combinatorial space. Computing these factors is an important problem, as they are needed, for instance, to evaluate the probability of evidence, rank two alternative models, and learn parameters from data. Unfortunately, computing these discrete integrals exactly in very high dimensional spaces quickly becomes intractable, and approximation techniques are often needed. Among them, sampling and variational methods are the most popular approaches. Variational inference problems are typically solved ... |

676 | Loopy belief propagation for approximate inference: an empirical study”,
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...lly, there are M 2 binary variables, with single node potentials ψi(xi) = exp(fixi) and pairwise interactions ψij(xi,xj) = exp(wijxixj), where wij ∈R [−w,w] and fi ∈R [−f,f]. We compare with Loopy BP =-=[23]-=- which estimates Z, Tree Reweighted BP [30] which gives a provable upper bound, and the Mean Field approach [31] which gives a provable lower bound. We use the implementations in the LibDAI library [2... |

549 |
Introduction to Linear Optimization. Athena Scientific,
- Tsitsiklis, Bertsimas
- 1997
(Show Context)
Citation Context ...icitly, we can directly use these techniques. 5 INTEGER PROGRAMMING FORMULATION The NP-hard combinatorial optimization problem maxσw(σ) subject to Aσ = b mod 2 can be formulated as an Integer Program =-=[4]-=-. This is a promising approach because Integer Linear Programs and related Linear programming (LP) relaxations have been shown to be a very effective at decoding binary codes by Feldman et al. [7]. Fu... |

321 |
Tilborg. On the inherent intractability of certain coding problems
- Berlekamp, McEliece, et al.
- 1978
(Show Context)
Citation Context ...th random parity constraints arising from the WISH scheme. These optimization problems turn out to be intimately connected with the fundamental problem of maximum likelihood decoding of a binary code =-=[3, 29]-=-. We leverage this connection to show that the inference queries generated by WISH are NP-hard to solve and to approximate, even for verysimplegraphicalmodels. Althoughgenerallyhard in the worst case,... |

276 | The Markov chain Monte Carlo method: an approach to approximate counting and integration
- Jerrum, Sinclair
- 1996
(Show Context)
Citation Context ...s work was supported by NSF Grant 0832782. which are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo =-=[17, 21, 32]-=- and Importance Sampling techniques [10, 11, 13] are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Re... |

183 | Using linear programming to decode binary linear codes,”
- Feldman, Wainwright, et al.
- 2005
(Show Context)
Citation Context ...tive Message Passing (MP) methods are among the most widely used decoding techniques. Although the decoding problem is computationally intractable, they usually have very good performance in practice =-=[7, 19]-=-. Since we can represent parity constraints as additional factors in our original factor graph model, MPtechniquescanalsobeheuristicallyapplied to solve the more general MAP inference queries with par... |

170 | The hardness of approximate optima in lattices, codes, and systems of linear equations.
- Arora, Babai, et al.
- 1993
(Show Context)
Citation Context ...ses: are there interesting counting problems for which we can approximate maxσ w(σ) subject to Aσ = b mod 2 in polynomial time? To shed some light on this question, we show a connection with a decision problem arising in coding theory: Definition 3 (MAXIMUM-LIKELIHOOD DECODING). Given a binary m × n matrix A, a vector b ∈ {0, 1}m, and an integer w > 0, is there a vector z ∈ {0, 1}n of Hamming weight ≤ w, such that Az = b mod 2? As noted by Vardy [29], Berlekamp et al. [3] showed that this problem is NP-complete with a reduction from 3-DIMENSIONAL MATCHING. Further, Stern [26] and Arora et al. [2] proved that even approximating within any constant factor the solution to this problem is NP-hard. These hardness results restrict the kind of problems we can hope to solve in our setting, which is more general. In fact, we can define a graphical model with single variable factors ψi(xi) = exp(−xi) for xi ∈ {0, 1}. Let S = {x ∈ {0, 1}n : Ax = b mod 2}. Then max x∈S w(x) = max x∈S n ∏ i=1 ψi(xi) = exp ( max x∈S n ∑ i=1 logψi(xi) ) = exp ( max x∈S −H(x) ) = exp ( −min x∈S H(x) ) where H(x) is the Hamming weight of x. Thus, MAXIMUM-LIKELIHOOD DECODING of a binary code is a special case of MAP in... |

160 | Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations.
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...ative message-passing decoding algorithms are closely related to LP relaxations of certain Integer Programs, either because they are directly trying to solve an LP or its dual like the MPLP and TRWBP =-=[9,25,30]-=-, orattemptingtoapproximatelysolveavariational problem over the same polytope like Loopy Belief Propagation [31]. 5.1 MAP INFERENCE AS AN ILP For simplicity, we consider the case of binary factors (pa... |

146 |
Expressing combinatorial optimization problems by linear programs.
- Yannakakis
- 1991
(Show Context)
Citation Context ...= {S ⊆ N(j) : |S| even} there is an extra binary variable wj,S ∈ {0,1}. It requires ∀j ∈ J, ∑ ∑ S∈Ej:i∈S wj,S. S∈Ej wj,S = 1 and ∀j ∈ J,∀i ∈ N(j),µi = 5.2.2 Compact polytope representation Yannakakis =-=[34]-=- introduced the following compact representation which requires only O(n 3 ) variables and constraints, where n is the number of variables. For each constraint j, define Tj = {0,2,··· ,2⌊|N(j)|/2⌋} as... |

139 | The Matrix Cookbook.
- Petersen, Pedersen
- 2012
(Show Context)
Citation Context ... TOEPLITZ MATRIX The performance of Algorithm 1 can be improved by constructing pairwise independent hash functions not by choosing A ∈R {0,1} i×n but rather letting A be a random i×n Toeplitz matrix =-=[24]-=-. Specifically, the first column and row of A are filled with uniform i.i.d. Bernoulli variables in {0,1}. The value of each entry is then copied into the corresponding descending topleft to bottom-ri... |

112 | Tightening LP relaxations for MAP using message passing.
- Sontag, Meltzer, et al.
- 2008
(Show Context)
Citation Context ...ative message-passing decoding algorithms are closely related to LP relaxations of certain Integer Programs, either because they are directly trying to solve an LP or its dual like the MPLP and TRWBP =-=[9,25,30]-=-, orattemptingtoapproximatelysolveavariational problem over the same polytope like Loopy Belief Propagation [31]. 5.1 MAP INFERENCE AS AN ILP For simplicity, we consider the case of binary factors (pa... |

88 | Linear programming relaxations and belief propagation – An empirical study.
- Yanover, Meltzer, et al.
- 2006
(Show Context)
Citation Context ...te that other techniques such as by Sontag et al. [25] could also be used to iteratively tighten the LP relaxation, and might lead to better scaling behavior on certain classes of very large problems =-=[35]-=-.Upper bound 140 135 130 125 120 UB − 25 constraints UB − 50 constraints UB − 75 constraints define the same set of solutions, namely {x ∈ {0,1} n : Ax = b} = {x ∈ {0,1} n : A ′ x = b ′ } but are muc... |

78 | libDAI: A free and open source c++ library for discrete approximate inference in graphical models.
- Mooij
- 2010
(Show Context)
Citation Context ...3] which estimates Z, Tree Reweighted BP [30] which gives a provable upper bound, and the Mean Field approach [31] which gives a provable lower bound. We use the implementations in the LibDAI library =-=[22]-=- and compare with ground truth obtained using the Junction Tree method [20]. Figure 4 shows the error in the resulting estimates, together with the upper and lower bounds obtained with WISH augmented ... |

76 | Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching”,
- Wainwright, Jaakkola, et al.
- 2003
(Show Context)
Citation Context ...is an important problem, as they are needed, for instance, to evaluate the probability of evidence, rank two alternative models, and learn parameters from data. Unfortunately, computing these discrete integrals exactly in very high dimensional spaces quickly becomes intractable, and approximation techniques are often needed. Among them, sampling and variational methods are the most popular approaches. Variational inference problems are typically solved using message passing techniques, ∗ This work was supported by NSF Grant 0832782. which are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo [17, 21, 32] and Importance Sampling techniques [10, 11, 13] are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Recently, Ermon et al. [6] introduced a new technique called WISH which comes with provable (probabilistic) guarantees on the approximation error. Their method combines combinatorial optimization techniques with the use of universal hash functions to uniformly partition a large combinatorial space, or... |

49 |
Lectures on Monte Carlo Methods.
- Madras
- 2002
(Show Context)
Citation Context ...s work was supported by NSF Grant 0832782. which are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo =-=[17, 21, 32]-=- and Importance Sampling techniques [10, 11, 13] are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Re... |

45 | Model counting: A new strategy for obtaining good bounds.
- Gomes, Sabharwal, et al.
- 2006
(Show Context)
Citation Context ... are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo [17, 21, 32] and Importance Sampling techniques =-=[10, 11, 13]-=- are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Recently, Ermon et al. [6] introduced a new techni... |

44 | Algorithmic complexity in coding theory and the minimum distance problem.
- Vardy
- 1997
(Show Context)
Citation Context ...th random parity constraints arising from the WISH scheme. These optimization problems turn out to be intimately connected with the fundamental problem of maximum likelihood decoding of a binary code =-=[3, 29]-=-. We leverage this connection to show that the inference queries generated by WISH are NP-hard to solve and to approximate, even for verysimplegraphicalmodels. Althoughgenerallyhard in the worst case,... |

36 | Samplesearch: Importance sampling in presence of determinism.
- Gogate, Dechter
- 2011
(Show Context)
Citation Context ... are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo [17, 21, 32] and Importance Sampling techniques =-=[10, 11, 13]-=- are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Recently, Ermon et al. [6] introduced a new techni... |

35 | A new approach to model counting.
- Wei, Selman
- 2005
(Show Context)
Citation Context ...s work was supported by NSF Grant 0832782. which are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo =-=[17, 21, 32]-=- and Importance Sampling techniques [10, 11, 13] are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Re... |

29 | On the connections between universal hashing, combinatorial designs and error-correcting codes. Congressus Numerantium,
- Stinson
- 1996
(Show Context)
Citation Context ...ft to bottom-right diagonal. This process requires n + i − 1 random bits rather than ni = O(n 2 ). Let T (m,n) ⊆ {0,1} m×n be the set of m × n Toeplitz matrices with 0,1 entries. Then: Proposition 2 (=-=[12, 27]-=-). Let A ∈ T (m,n), b ∈ {0,1} m . The family H n,m T = {hA,b(x) : {0,1} n → {0,1} m } where hA,b(x) = Ax+b mod 2 is a family of pairwise independent hash functions. WISH({H n,m T }) still provides the... |

28 | SampleSearch: A scheme that searches for consistent samples.
- Gogate, Dechter
- 2007
(Show Context)
Citation Context ... are often guaranteed to converge to some local minimum [30, 31], but without guarantees on the quality of the solution found. Markov Chain Monte Carlo [17, 21, 32] and Importance Sampling techniques =-=[10, 11, 13]-=- are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Recently, Ermon et al. [6] introduced a new techni... |

28 | Nearuniform sampling of combinatorial spaces using XOR constraints.
- Gomes, Sabharwal, et al.
- 2006
(Show Context)
Citation Context ...iversal hash functions to uniformly partition a large combinatorial space, originally introduced by Valiant and Vazirani to study the Unique Satisfiability problem and later exploited by Gomes et al. =-=[13, 14]-=- for solution counting. Specifically, they show that one can obtain the intractable normalization constant (partition function) of a graphical model within any desired degree of accuracy, by solving a... |

28 |
On defining sets of vertices of the hypercube by linear inequalities.
- Jeroslow
- 1975
(Show Context)
Citation Context ...ly the indexes of the non-zero columns of the j-th row of A 2 . We’ll refer to |N(j)| as the length of the j-th XOR. 5.2.1 Exponential polytope representation The simplest encoding is due to Jeroslow =-=[16]-=-. It requires that for all j ∈ J, S ⊆ N(j), and |S| odd, the following should hold ∑ µi + ∑ (1−µi) ≤ |N(j)|−1 i∈S i∈(N(j)\S) Clearly, this requires a number of constraints that is exponential in the l... |

25 |
Low-density parity-check codes. Information Theory,
- Gallager
- 1962
(Show Context)
Citation Context ...th random parity constraints arising from the WISH scheme. These optimization problems turn out to be intimately connected with the fundamental problem of maximum likelihood decoding of a binary code [3, 29]. We leverage this connection to show that the inference queries generated by WISH are NP-hard to solve and to approximate, even for very simple graphical models. Although generally hard in the worst case, message passing and related linear programming techniques [7] are known to be very successful in practice in decoding certain types of codes such as low density parity check (LDPC) codes [8]. Inspired by the success of these methods, we formulate the MAP inference queries generated by WISH as Integer Linear Programs (ILP). Unfortunately, such queries are typically harder than traditional decoding problems because they involve more complex probabilistic models, and because universal hash functions naturally give rise to very “dense” parity constraints. To address this issue, we propose a technique to construct equivalent but sparser (and empirically easier to solve) parity constraints. Further, we introduce a more general version of WISH that relies directly on arbitrarily sparse ... |

16 | Taming the curse of dimensionality: Discrete integration by hashing and optimization.
- Ermon, Gomes, et al.
- 2013
(Show Context)
Citation Context ...ing techniques [10, 11, 13] are asymptotically correct, but the number of samples required to obtain a statistically reliable estimate can grow exponentially in the worst case. Recently, Ermon et al. =-=[6]-=- introduced a new technique calledWISHwhichcomeswithprovable(probabilistic) guarantees on the approximation error. Their method combines combinatorial optimization techniques with the use of universal... |

12 |
Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143–154
- Carter, Wegman
- 1979
(Show Context)
Citation Context ...d evidence based on universal hashing. 3.1 FAMILIES OF HASH FUNCTIONS A key ingredient of the WISH algorithm is the concept of pairwise independent hashing, originally introduced by Carter and Wegman =-=[5]-=- and later recognized as a tool that “should belong to the bag of tricks of every computer scientist” [33]. There are several indepth expositions of the topic [cf. 12, 27, 28]. Here we will also make ... |

9 | Foundations and Trends in Theoretical Computer Science, - Pseudorandomness - 2011 |

8 |
Randomized methods in computation. Lecture Notes,
- Goldreich
- 2011
(Show Context)
Citation Context ...ING USING TOEPLITZ MATRIX The performance of Algorithm 1 can be improved by constructing pairwise independent hash functions not by choosing A ∈R {0, 1}i×n but rather letting A be a random i × n Toeplitz matrix [24]. Specifically, the first column and row of A are filled with uniform i.i.d. Bernoulli variables in {0, 1}. The value of each entry is then copied into the corresponding descending topleft to bottom-right diagonal. This process requires n + i − 1 random bits rather than ni = O(n2). Let T (m,n) ⊆ {0, 1}m×n be the set of m × n Toeplitz matrices with 0, 1 entries. Then: Proposition 2 ([12, 27]). Let A ∈ T (m,n), b ∈ {0, 1}m. The family Hn,mT = {hA,b(x) : {0, 1} n → {0, 1}m} where hA,b(x) = Ax+ b mod 2 is a family of pairwise independent hash functions. WISH({Hn,mT }) still provides the same theoretical guarantees as Theorem 1 but has a more deterministic and stable behavior as it requires only Θ(n2 log n) random bits rather than Θ(n3 log n). 4 CONNECTIONS WITH CODING THEORY For a problem with n binary variables, WISH requires solving Θ(n log n) optimization instances. If these optimizations could be approximated (within a constant factor of the true optimal value) in polynomial tim... |

6 |
Toulbar2, an open source exact cost function network solver.
- Allouche, Givry, et al.
- 2010
(Show Context)
Citation Context ...y class believed to be even harder than NP. In practice, Ermon et al. [6] showed that the resulting MAP inference can be solved reasonably well using a state-of-the-art MAPinferenceenginecalledToulbar=-=[1]-=-,whichwasextendedwithcustompropagatorsforparityconstraints. Theorem 1 ([6]). For any δ > 0, positive constant α ≤ 0.0042, and the hash families H n,i given by Proposition 1, WISH({H n,i }) makes Θ(nln... |

6 |
Approximating the number of error locations within a constant ratio is np-complete.
- Stern
- 1993
(Show Context)
Citation Context ...0,1} n of Hamming weight ≤ w, such that Az = b mod 2? As noted by Vardy [29], Berlekamp et al. [3] showed that this problem is NP-complete with a reduction from 3-DIMENSIONAL MATCHING. Further, Stern =-=[26]-=- and Arora et al. [2] proved that even approximating within any constant factor the solution to this problem is NP-hard. Thesehardnessresultsrestrictthekindofproblemswe can hope to solve in our settin... |

5 |
Randomized methods in computation
- Goldreich
- 2011
(Show Context)
Citation Context ...ft to bottom-right diagonal. This process requires n + i − 1 random bits rather than ni = O(n 2 ). Let T (m,n) ⊆ {0,1} m×n be the set of m × n Toeplitz matrices with 0,1 entries. Then: Proposition 2 (=-=[12, 27]-=-). Let A ∈ T (m,n), b ∈ {0,1} m . The family H n,m T = {hA,b(x) : {0,1} n → {0,1} m } where hA,b(x) = Ax+b mod 2 is a family of pairwise independent hash functions. WISH({H n,m T }) still provides the... |

2 |
Lectures on the fusion method and derandomization.
- Wigderson
- 1995
(Show Context)
Citation Context ...thm is the concept of pairwise independent hashing, originally introduced by Carter and Wegman [5] and later recognized as a tool that “should belong to the bag of tricks of every computer scientist” =-=[33]-=-. There are several indepth expositions of the topic [cf. 12, 27, 28]. Here we will also make use of a weaker notion of hashing, called uniform hashing and defined as follows: Definition 1. A family o... |

1 |
The hardnessofapproximateoptimainlattices, codes, and systems of linear equations
- Arora, Babai, et al.
- 1993
(Show Context)
Citation Context ...ht ≤ w, such that Az = b mod 2? As noted by Vardy [29], Berlekamp et al. [3] showed that this problem is NP-complete with a reduction from 3-DIMENSIONAL MATCHING. Further, Stern [26] and Arora et al. =-=[2]-=- proved that even approximating within any constant factor the solution to this problem is NP-hard. Thesehardnessresultsrestrictthekindofproblemswe can hope to solve in our setting, which is more gene... |

1 |
Tree-reweightedbeliefpropagationalgorithms and approximate ML estimation via pseudomoment matching
- Wainwright
- 2003
(Show Context)
Citation Context ...hes. Variational inference problems are typically solved using message passing techniques, ∗ This work was supported by NSF Grant 0832782. which are often guaranteed to converge to some local minimum =-=[30, 31]-=-, but without guarantees on the quality of the solution found. Markov Chain Monte Carlo [17, 21, 32] and Importance Sampling techniques [10, 11, 13] are asymptotically correct, but the number of sampl... |