## Approximate Inference in Graphical Models using LP Relaxations (2010)

Citations: | 12 - 1 self |

### BibTeX

@MISC{Sontag10approximateinference,

author = {David Alexander Sontag and S. Jaakkola},

title = {Approximate Inference in Graphical Models using LP Relaxations},

year = {2010}

}

### OpenURL

### Abstract

Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the Sherali-Adams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cutting-plane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion.

### Citations

1485 | Fast approximate energy minimization via graph cuts - BOYKOV, VEKSLER, et al. - 2001 |

1129 | A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms
- Scharstein, Szeliski
- 2008
(Show Context)
Citation Context ...igure 1-1). This disparity can be straightforwardly translated into depth from the camera. The best algorithms currently known for the stereo problem are those that minimize a global energy function (=-=Scharstein & Szeliski, 2002-=-), which is equivalent to finding a MAP configuration in a pairwise model. For our experiments we use the pairwise model described in Yanover et al. (2006), and apply our procedure to the “Tsukuba” se... |

974 | Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming - Goemans, Williamson - 1995 |

870 | An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision
- Boykov, Kolmogorov
- 2004
(Show Context)
Citation Context ...random fields was first observed by Greig et al. (1989), and has become extremely influential in computer vision where such edge potentials are commonly used image segmentation, among other problems (=-=Boykov & Kolmogorov, 2004-=-). There has been significant interest in the computer vision and machine learning communities in using these efficient combinatorial algorithms for nonbinary MRFs. For example, Boykov et al. (2001) s... |

517 | Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms - Collins |

499 | Interior point methods in semidefinite programming with applications to combinatorial optimization - Alizadeh - 1995 |

495 | Loopy belief propagation for approximate inference: An empirical study
- Murphy, Weiss, et al.
- 1999
(Show Context)
Citation Context ...to give good results in many cases when the pairwise LP relaxation is tight, such as for bipartite matching (Bayati et al., 2008), it gives much worse results in models where there is frustration 63(=-=Murphy et al., 1999-=-). These are precisely the cases when the pairwise LP relaxation is not tight. By using a message-passing algorithm that directly solves the LP relaxation, we have a clear way of improving the approxi... |

442 | Constructing free-energy approximations and generalized belief propagation algorithms
- Yedidia, Freeman, et al.
(Show Context)
Citation Context ...the MAP inference problem, from local search in the space of assignments, to optimization. One particularly successful class of approximate inference algorithms are called message passing algorithms (=-=Yedidia et al., 2005-=-). Message passing algorithms solve the inference problem by passing messages along edges of the graph that summarize each variable’s beliefs. After a node receives messages from its neighbors, it upd... |

371 |
Introduction to Linear Optimization. Athena Scientific
- Bertsimas, Tsitsiklis
- 1996
(Show Context)
Citation Context ...ur dual algorithms for tightening the LP relaxation, given in Chapter 5 and Chapter 7, are a delayed column generation method, where variables rather than constraints are iteratively added to the LP (=-=Bertsimas & Tsitsiklis, 1997-=-). By solving and tightening the LP relaxations completely in the dual, our algorithms resolve many of the problems raised by Liers et al. (2004), and thus may also be of interest to the operations re... |

346 | Exact Maximum A Posteriori Estimation for Binary Images - Greig, Porteous, et al. - 1989 |

340 | Turbo decoding as an instance of Pearl’s belief propagation algorithm
- McEliece, MacKay, et al.
- 1998
(Show Context)
Citation Context ... 2000). Finally, the decoding problem for low-density parity-check codes (LDPCs), and other codes, corresponds to MAP inference in a non-pairwise Markov random field with binary variables (see, e.g., =-=Mceliece et al., 1998-=-). An iterative algorithm called belief propagation, discussed further in the next section, was found to be surprisingly effective at these inference problems, and spurred a significant amount of rese... |

324 | Convergent Tree-Reweighted Message Passing for Energy Minimization
- Kolmogorov
- 2006
(Show Context)
Citation Context ...an be optimized over in closed form. In recent years, a number of dual LP relaxation algorithms have been proposed, and these have been demonstrated to be useful tools for solving large MAP problems (=-=Kolmogorov, 2006-=-; Werner, 2007; Globerson & Jaakkola, 2008; Komodakis & Paragios, 2008). These algorithms can all be understood as dual coordinate descent, but operate in different duals of the same pairwise LP relax... |

316 | Polynomial-time approximation algorithms for the Ising model
- Jerrum, Sinclair
- 1993
(Show Context)
Citation Context ...rresponding to a summation over all possible assignments to the model. Calculating Z(θ) can be shown to be #P-hard, although fully-polynomial approximation algorithms do exist for some special cases (=-=Jerrum & Sinclair, 1993-=-). We may also be interested in computing marginal probabilities for the variables in the model. We use the following notation to refer to the single node and edge marginals: i∈V µi(xi) = Eθ[φi;xi (x)... |

257 |
Geometry of Cuts and Metrics
- Deza, Laurent
- 1997
(Show Context)
Citation Context ...alued pairwise MRFs because this special case is equivalent to the maximum cut problem. In this case, the marginal polytope is isomorphic to the cut polytope, the convex hull of all valid graph cuts (=-=Deza & Laurent, 1997-=-; Sontag, 2007). The pairwise LP relaxation can be shown to have the following properties in this setting: • The fractional vertices are half integral (Deza & Laurent, 1997). Each edge marginal 23(a)... |

238 | Fast approximation algorithms for fractional packing and covering problems
- Plotkin, Shmoys, et al.
- 1995
(Show Context)
Citation Context ... packing LP can be shown to be equivalent to the pairwise LP relaxation (see Section 2.5.1). In these cases, the fractional packing problem can be approximately solved using combinatorial algorithms (=-=Plotkin et al., 1995-=-; Awerbuch & Khandekar, 2008). It would be extremely interesting to obtain combinatorial algorithms that could efficiently solve LP or SDP relaxations for non-binary pairwise MRFs more generally. 2.4 ... |

234 | An algorithm for integer solutions to linear programs - Gomory - 1963 |

204 |
A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems
- Sherali, Adams
- 1990
(Show Context)
Citation Context ... be exact earlier (Wainwright & Jordan, 2008, p.227). This sequence of relaxations is known in the polyhedral combinatorics and theoretical computer science literature as the Sherali-Adams hierarchy (=-=Sherali & Adams, 1990-=-; Wainwright & Jordan, 2004, 2008). This thesis will be largely devoted to a study of the first lifting of the Sherali-Adams hierarchy, which we can state more simply as ⎧ ∑ ⎪⎨ xj TRI(G) = µ ≥ 0 ∃τ ≥ ... |

178 | Optimal inapproximability results for MAX-CUT and other 2-variable CSPs - Khot, Kindler, et al. |

165 | Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov Random Fields
- Kleinberg, Tardos
- 1999
(Show Context)
Citation Context ...for nonbinary MRFs. For example, Boykov et al. (2001) showed how to use local search together with graph cuts to approximately find the MAP assignment in non-binary MRFs. The metric labeling problem (=-=Kleinberg & Tardos, 1999-=-) is an instance of MAP in a non-binary MRF where there is a metric space on the variables’ states and the edge potentials are proportional to the distance according to this metric. Metric labeling ge... |

163 | A new class of upper bounds on the log partition function - Wainwright, Jaakkola, et al. - 2002 |

141 | The complexity of multiterminal cuts
- Dalhaus, Johnson, et al.
- 1994
(Show Context)
Citation Context ... the variables’ states and the edge potentials are proportional to the distance according to this metric. Metric labeling generalizes several combinatorial optimization problems such as multiway cut (=-=Dahlhaus et al., 1994-=-). Metric labeling has a O(1/ log k)-approximation, where k is the number of states per variable, which can be obtained using either combinatorial algorithms or an LP relaxation (Kleinberg & Tardos, 1... |

140 | Map Estimation via Agreement on Trees: Message-Passing and Linear Programming - Wainwright, Jaakkola, et al. - 2005 |

134 | Comparison of Graph Cuts with Belief Propagation for Stereo, Using - Tappen, Freeman - 2003 |

123 |
On the cut polytope
- Barahona, Mahjoub
- 1986
(Show Context)
Citation Context ....2 Cycle Inequalities In this section we describe a polyhedral approach to tightening the relaxation, where we use new constraints but no new variables. We begin by describing the cycle inequalities (=-=Barahona & Mahjoub, 1986-=-; Barahona, 1993; Deza & Laurent, 1997) for graphical models with binary variables. Let M {0,1} denote the marginal polytope of a binary pairwise MRF. Given an assignment x ∈ {0, 1} n , edge ij ∈ E is... |

123 | Vertex Packings: Structural properties and algorithms - Nemhauser, Jr - 1975 |

115 | Pseudo-boolean optimization
- Boros, Hammer
(Show Context)
Citation Context ...where the LP relaxation has a fractional solution. • Persistency, which guarantees that there exists a MAP assignment that extends the integer parts of a fractional vertex (Nemhauser & Trotter, 1975; =-=Boros & Hammer, 2002-=-). This can be applied to some non-binary MRFs by using a clever transformation of the problem into a binary MRF (Kohli et al., 2008). • Gives a 2-approximation when applied to the maximum cut problem... |

112 | Using linear programming to decode binary linear codes
- Feldman, Wainwright, et al.
- 2005
(Show Context)
Citation Context ...oblems, and spurred a significant amount of research in the theoretical computer science community. Linear programming relaxations can also be shown to give provably good information-theoretic rates (=-=Feldman et al., 2005-=-; Vontobel & Koetter, 2006; Arora et al., 2 MAP inference in Ising models with arbitrary external fields, or node potentials, can be transformed into an s-t minimum cut problem, solvable in polynomial... |

109 | A linear programming approach to maxsum problem: A review
- Werner
- 2005
(Show Context)
Citation Context ...er in closed form. In recent years, a number of dual LP relaxation algorithms have been proposed, and these have been demonstrated to be useful tools for solving large MAP problems (Kolmogorov, 2006; =-=Werner, 2007-=-; Globerson & Jaakkola, 2008; Komodakis & Paragios, 2008). These algorithms can all be understood as dual coordinate descent, but operate in different duals of the same pairwise LP relaxation, arising... |

100 |
Finding MAPs for belief networks is NP-hard
- Shimony
- 1994
(Show Context)
Citation Context ...le, the stereo vision and the protein design problems – do not have such structure. In terms of worst-case theoretical guarantees, inference is NP-hard, and in many cases cannot even be approximated (=-=Shimony, 1994-=-). However, as we motivated earlier with regards to stereo vision, it is not at all clear that real-world inference problems are as difficult as the theoretical worst case. Humans can estimate depth f... |

90 |
C.: Introduction to Algorithms, 2nd edn
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ... projection graph Gπ = (Vπ, Eπ) and edge pseudomarginals µ π, find the most violated cycle inequality. Using Dijkstra’s shortest paths algorithm with a Fibonacci heap to implement the priority queue (=-=Cormen et al., 2001-=-), each shortest paths computation takes time O(n log n+|E ′ |), where n = |V ′ |. Thus, the overall running time of the separation algorithm from Figure 3-4 is O(n 2 log n + n|E ′ |). Implementation ... |

90 | A comparison of the Sherali-Adams, lovász-Schrijver and Lasserre relaxations for 0− 1 programming
- Laurent
(Show Context)
Citation Context ...o Sherali-Adams, Lovász-Schrijver, and Lasserre Hierarchies Two other lift-and-project techniques that are frequently studied in integer programming are the Lovász-Schrijver and Lasserre hierarchies (=-=Laurent, 2003-=-; Wainwright & Jordan, 2004). In contrast to the Sherali-Adams hierarchy, these correspond to semi-definite outer bounds on the marginal polytope. Alizadeh (1993) gives an efficient interior-point alg... |

89 | Residual belief propagation: Informed scheduling for asynchronous message passing - Elidan, McGraw, et al. - 2006 |

84 | Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations
- Globerson, Jaakkola
- 2007
(Show Context)
Citation Context ...orm. In recent years, a number of dual LP relaxation algorithms have been proposed, and these have been demonstrated to be useful tools for solving large MAP problems (Kolmogorov, 2006; Werner, 2007; =-=Globerson & Jaakkola, 2008-=-; Komodakis & Paragios, 2008). These algorithms can all be understood as dual coordinate descent, but operate in different duals of the same pairwise LP relaxation, arising from different approaches t... |

77 |
The volume algorithm: producing primal solutions with a subgradient method
- Anbil, Barahona
- 2000
(Show Context)
Citation Context ...inatorics because of its relevance to max cut and Ising models. There is a huge literature in the operations research community on cutting-plane algorithms for max cut that use the cycle inqualities (=-=Barahona & Anbil, 2000-=-; Liers et al., 2004; Frangioni et al., 2005). To apply the cutting-plane approach, we must answer several key questions: 1. What are valid constraints for the marginal polytope? We already discussed ... |

77 | Minimum cuts in near-linear time - Karger - 1996 |

77 | Bethe free energy, Kikuchi approximations, and belief propagation algorithms
- Yedidia, Freeman, et al.
- 2001
(Show Context)
Citation Context ...P relaxation which has precisely these constraints. Belief propagation can be seen as optimizing pseudomarginals over the pairwise relaxation with the (non-convex) Bethe approximation to the entropy (=-=Yedidia et al., 2001-=-). The tree-reweighted sum-product algorithm (Wainwright et al., 2005b), on the other hand, uses a concave upper bound on the entropy, expressed as a convex combination of entropies corresponding to t... |

74 | Belief Propagation and Revision in Networks with Loops - Weiss - 1997 |

70 | Maximizing quadratic programs: extending grothendieck’s inequality - Charikar, Wirth - 2004 |

63 | A combinatorial, primal-dual approach to semidefinite programs
- Arora, Kale
- 2007
(Show Context)
Citation Context ...mum cut is based on semi-definite programming. Recently, several authors have considered fast algorithms for obtaining the same approximation guarantee, without having to solve the SDP to optimality (=-=Arora & Kale, 2007-=-; Trevisan, 2009). Arora et al. (2005) give algorithms based on multiplicative weights to solve the SDP relaxation of MaxQP. Some MAP inference problems can be transformed into packing problems, where... |

61 | Dual decomposition for parsing with nonprojective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ... example, we have recently had success applying a dual decomposition approach similar to that given in Chapter 4 to the NLP problem of non-projective dependency parsing with higher-order constraints (=-=Koo et al., 2010-=-). Our algorithm uses a combinatorial algorithm to solve a first-order parsing subproblem as part of the overall optimization, analogous to how we used dynamic programming to solve spanning tree subpr... |

57 |
Efficient rotamer elimination applied to protein side-chains and related spin glasses
- Goldstein
- 1994
(Show Context)
Citation Context ...e of the most effective methods is called dead-end elimination, and corresponds to iteratively applying local rules that attempt to rule out variables’ states as not being part of the MAP assignment (=-=Goldstein, 1994-=-; Pierce et al., 2000). Finally, the decoding problem for low-density parity-check codes (LDPCs), and other codes, corresponds to MAP inference in a non-pairwise Markov random field with binary variab... |

53 | On the optimality of tree-reweighted max-product message passing
- Kolmogorov, Wainwright
- 2005
(Show Context)
Citation Context ...ded join-graphs described in Dechter et al. (2002). There is a large body of recent work describing the relationship between message-passing algorithms such as belief propagation, and LP relaxations (=-=Kolmogorov & Wainwright, 2005-=-; Weiss et al., 2007; Yanover et al., 2006). Although we have focused here on using one particular message-passing algorithm, MPLP, we emphasize that similar region-pursuit algorithms can be derived f... |

52 | The partial constraint satisfaction problem: Facets and lifting theorems. Operation Research Letters - Koster, Hoesel, et al. - 1998 |

49 | High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF - Werner - 2008 |

39 | Using Combinatorial Optimization within Max-Product Belief Propagation - Duchi, Tarlow, et al. - 2007 |

39 |
Beyond loose LP-relaxations: Optimizing MRFs by repairing cycles
- Komodakis, Paragios
- 2008
(Show Context)
Citation Context ...lving the LP relaxations. 3.5 Related Work Recently, several authors have suggested seemingly different approaches to tightening the pairwise LP relaxation (Globerson & Jaakkola, 2007a; Werner, 2008; =-=Komodakis & Paragios, 2008-=-; Schraudolph & Kamenetsky, 2009; Batra et al., 2010; Schraudolph, 2010). We show in this section that these approaches can all be viewed as enforcing either cluster or cycle consistency, and thus cor... |

38 |
Conformational splitting: a more powerful criterion for dead-end elimination
- Pierce, Spriet, et al.
- 2000
(Show Context)
Citation Context ...ective methods is called dead-end elimination, and corresponds to iteratively applying local rules that attempt to rule out variables’ states as not being part of the MAP assignment (Goldstein, 1994; =-=Pierce et al., 2000-=-). Finally, the decoding problem for low-density parity-check codes (LDPCs), and other codes, corresponds to MAP inference in a non-pairwise Markov random field with binary variables (see, e.g., Mceli... |

35 | Quadratic forms on graphs
- Alon
- 2005
(Show Context)
Citation Context ...ation algorithm, where n is the number of variables in the MRF, also based on semi-definite programming. This approach is closely related to the study of Grothendieck’s inequality (Alon & Naor, 2004; =-=Alon et al., 2005-=-). On the positive side, MAP inference in Ising models with ferromagnetic potentials, or positive edge weights, is equivalent to the minimum cut problem, which can be solved in nearly linear time (Kar... |

34 | Message-passing for graph-structured linear programs: Proximal projections, convergence and rounding schemes - Ravikumar, Agarwal, et al. - 2008 |

33 | Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation
- Meltzer, Yanover, et al.
- 2005
(Show Context)
Citation Context ...ter. In the first experiment, we varied the parameters of the energy function to create several different instances. We tried to find the MAP using TRBP, resolving ties using the methods proposed in (=-=Meltzer et al., 2005-=-). In 4 out of 10 cases those methods failed. Using our algorithm, we managed to find the MAP for all 4 cases. 6 Figure 5-4 shows the dual objective and the decoded integer solution after each MPLP it... |