## MAP estimation via agreement on trees: Message-passing and linear programming (2002)

### Cached

### Download Links

- [www.ai.mit.edu]
- [ssg.mit.edu]
- [www.eecs.berkeley.edu]
- [www.stanford.edu]
- [ssg.mit.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 132 - 8 self |

### BibTeX

@MISC{Wainwright02mapestimation,

author = {M. J. Wainwright and et al.},

title = {MAP estimation via agreement on trees: Message-passing and linear programming},

year = {2002}

}

### OpenURL

### Abstract

We develop and analyze methods for computing provably optimal maximum a posteriori (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of tree-structured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is tight if and only if all the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a tree-relaxed linear program (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a tree-reweighted max-product messagepassing algorithm that is related to but distinct from the max-product algorithm. In this way, we establish a connection between a certain LP relaxation of the modefinding problem, and a reweighted form of the max-product (min-sum) message-passing algorithm.

### Citations

3280 | Variational Analysis
- ROCKAFELLAR, WETS
- 1998
(Show Context)
Citation Context ...e marginal polytope associated with the graph . By definition, any marginal polytope is the convex hull of a finite number of vectors—namely, the collection . Consequently, the Minkowski–Weyl theorem =-=[37]-=- ensures that can be represented as an intersection of halfspaces where is a finite index set and each halfspace is of the form for some and . These half-space constraints include the nonnegativity co... |

1477 |
Exactly Solved Models in Statistical Mechanics
- Baxter
- 1982
(Show Context)
Citation Context ...an be represented as the exponential family , where Note that each vector indexes a particular Markov random field in this exponential family. Example 1: The Ising model of statistical physics (e.g., =-=[5]-=-) provides a simple illustration of a collection of MRFs in this form. This model involves a vector , with a distribution defined by potential functions only on cliques of size at most two (i.e., vert... |

1169 |
Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm
- Viterbi
(Show Context)
Citation Context ...ramming known as the max-product or min-sum algorithm (e.g., [2], [14], [15]). This algorithm, which entails passing “messages” from node to node, represents a generalization of the Viterbi algorithm =-=[40]-=- from chains to arbitrary cycle-free graphs. In recent years, the max-product algorithm has also been studied in application to graphs with cycles as a method for computing approximate MAP assignments... |

1141 |
Extremal graph theory
- Bollobás
- 1978
(Show Context)
Citation Context ...ction provides the background and some preliminary developments necessary for subsequent sections. We begin with a brief overview of some graph-theoretic basics; we refer the reader to the books [9], =-=[10]-=- for additional background on graph theory. We then describe the formalism of Markov random fields; more details can be found in various sources (e.g., [12], [14], [34]). We conclude by formulating th... |

1109 | Graphical Models - Lauritzen - 1996 |

1023 | Multiuser Detection
- Verdú
- 1998
(Show Context)
Citation Context ...I. INTRODUCTION INTEGER programming problems arise in various fields, including communication theory, error-correcting coding, image processing, statistical physics, and machine learning (e.g., [35], =-=[39]-=-, [8]). Many such problems can be formulated in terms of Markov random fields (e.g., [8], [14]), in which the cost function corresponds to a graph-structured probability distribution, and the goal is ... |

927 | On the statistical analysis of dirty pictures
- Besag
- 1986
(Show Context)
Citation Context ...RODUCTION INTEGER programming problems arise in various fields, including communication theory, error-correcting coding, image processing, statistical physics, and machine learning (e.g., [35], [39], =-=[8]-=-). Many such problems can be formulated in terms of Markov random fields (e.g., [8], [14]), in which the cost function corresponds to a graph-structured probability distribution, and the goal is to fi... |

748 |
Neuro-Dynamic Programming
- Bertsekas, Tsitsiklis
- 1996
(Show Context)
Citation Context ...ure of , meaning that . Note that the cost function is a convex combination of convex functions; moreover, with fixed, the constraints are all linear in . Under these conditions, strong duality holds =-=[6]-=-, so that this constrained optimization problem can be tackled via its Lagrangian dual. The dual formulation turns out to have a surprisingly simple and intuitive form. In particular, recall the set d... |

744 | Nonlinear Programming. Athena Scientific - Bertsekas - 1995 |

628 |
Probabilistic Networks and Expert Systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...theory, error-correcting coding, image processing, statistical physics, and machine learning (e.g., [35], [39], [8]). Many such problems can be formulated in terms of Markov random fields (e.g., [8], =-=[14]-=-), in which the cost function corresponds to a graph-structured probability distribution, and the goal is to find the maximum a posteriori probability (MAP) configuration. It is well known that the co... |

554 |
Combinatorial optimization: Polyhedra and Efficiency
- Schrijver
- 2003
(Show Context)
Citation Context ...ion of the MAP estimation problem. Overall, this paper establishes connections between two approaches to solving the MAP estimation problem: LP relaxations of integer programming problems (e.g., [7], =-=[38]-=-), and (approximate) dynamic programming methods using message passing in the max-product algebra. More specifically, our work shows that a (suitably reweighted) form of the max-product or min-sum alg... |

530 |
Algebraic graph theory
- Biggs
- 1993
(Show Context)
Citation Context ...is section provides the background and some preliminary developments necessary for subsequent sections. We begin with a brief overview of some graph-theoretic basics; we refer the reader to the books =-=[9]-=-, [10] for additional background on graph theory. We then describe the formalism of Markov random fields; more details can be found in various sources (e.g., [12], [14], [34]). We conclude by formulat... |

431 |
Convex Analysis and Minimization Algorithms I
- Hiriart-Urruty, Lemaréchal
- 1993
(Show Context)
Citation Context ...ion �@��� � ���Y A, where ��� is a fixed collection of noisy observations.sWAINWRIGHT et al.: MAP ESTIMATION VIA AGREEMENT ON TREES 3701 is its convexity, which allows us to apply Jensen’s inequality =-=[28]-=-. More specifically, let be a finite collection of nonnegative weights that sum to one, and consider a collection of exponential parameters such that . Then applying Jensen’s inequality yields the upp... |

348 | Introduction to Linear Optimization, Athena Scientific - Bertsimas, Tsitsiklis - 1997 |

329 |
Exact maximum a posteriori estimation for binary images
- Greig, Porteous, et al.
- 1989
(Show Context)
Citation Context ... couplings. First, it can be shown that tree-reweighted message-passing always succeeds for any submodular binary problem, of which the example given in Fig. 5 is a special case. Although it is known =-=[26]-=- that such problems can be solved in polynomial time via reduction to a max-flow problem, it is nonetheless interesting that tree-reweighted message-passing is also successful for this class of proble... |

320 |
Markov Chains, Gibbs Field, Monte Carlo Simulation and Queues
- Brémaud
- 1999
(Show Context)
Citation Context ...sics; we refer the reader to the books [9], [10] for additional background on graph theory. We then describe the formalism of Markov random fields; more details can be found in various sources (e.g., =-=[12]-=-, [14], [34]). We conclude by formulating the MAP estimation problem for a Markov random field. A. Undirected Graphs An undirected graph consists of a set of nodes or vertices that are joined by a set... |

313 | Turbo decoding as an instance of Pearl’s “belief propagation algorithm
- McEliece, MacKay, et al.
- 1998
(Show Context)
Citation Context ...ithm. I. INTRODUCTION INTEGER programming problems arise in various fields, including communication theory, error-correcting coding, image processing, statistical physics, and machine learning (e.g., =-=[35]-=-, [39], [8]). Many such problems can be formulated in terms of Markov random fields (e.g., [8], [14]), in which the cost function corresponds to a graph-structured probability distribution, and the go... |

304 | Understanding belief propagation and its generalizations - Yedidia, Freeman, et al. - 2003 |

294 |
The generalized distributive law
- Aji, McEliece
(Show Context)
Citation Context ... Associate Editor for Detection and Estimation. Digital Object Identifier 10.1109/TIT.2005.856938 0018-9448/$20.00 © 2005 IEEE dynamic programming known as the max-product or min-sum algorithm (e.g., =-=[2]-=-, [14], [15]). This algorithm, which entails passing “messages” from node to node, represents a generalization of the Viterbi algorithm [40] from chains to arbitrary cycle-free graphs. In recent years... |

265 | Codes and Decoding on General Graphs
- Wiberg
- 1996
(Show Context)
Citation Context ...mance guarantees for particular codes and channels. In the context of decoding, the fractional vertices of the polytope have a very concrete interpretation as pseudocodewords (e.g., [22], [25], [29], =-=[45]-=-). More broadly, it remains to further explore and analyze the range of problems for which the iterative algorithms and LP relaxations described here are suitable. VI. DISCUSSION In this paper, we dem... |

185 | On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs
- Weiss, Freeman
(Show Context)
Citation Context ...bitrary cycle-free graphs. In recent years, the max-product algorithm has also been studied in application to graphs with cycles as a method for computing approximate MAP assignments (e.g., [1], [21]–=-=[23]-=-, [29], [43]). Although the method may perform well in practice, it is no longer guaranteed to output the correct MAP assignment, and it is straightforward to demonstrate problems on which it specifie... |

156 | A new class of upper bounds on the log partition function
- Wainwright, Jaakkola, et al.
- 2005
(Show Context)
Citation Context ...7], who showed that the sum–product algorithm has a variational interpretation involving the so-called Bethe free energy. In addition, the work described here is linked in spirit to our previous work =-=[41]-=-, [44], in which we showed how to upper-bound the log partition function using a “convexified form” of the Bethe free energy. Whereas this convex variational problem led to a method for computing appr... |

112 | Using linear programming to decode binary linear codes - Feldman, Wainwright, et al. - 2005 |

112 | Convex Analysis and Minimization Algorithms, vols - Hiriart-Urruty, Lemaréchal - 1993 |

108 |
Introduction to Linear Optimization
- Bertsimas, Tsitsiklis
- 1998
(Show Context)
Citation Context ...mulation of the MAP estimation problem. Overall, this paper establishes connections between two approaches to solving the MAP estimation problem: LP relaxations of integer programming problems (e.g., =-=[7]-=-, [38]), and (approximate) dynamic programming methods using message passing in the max-product algebra. More specifically, our work shows that a (suitably reweighted) form of the max-product or min-s... |

102 | Tree-based reparameterization framework for analysis of sum-product and related algorithms - Wainwright, Jaakkola, et al. |

98 |
Matroids and the greedy algorithm
- EDMONDS
- 1971
(Show Context)
Citation Context ... � appear in two and one of the spanning trees, respectively, which gives rise to edge appearance probabilities & aPaQ and & aIaQ. such vector must belong to the so-called spanning tree polytope [7], =-=[17]-=-, which we denote by . See Fig. 1 for an illustration of the edge appearance probabilities. Although we allow for the support to be a strict subset of the set of all spanning trees, we require that fo... |

87 |
Applications of a general propagation algorithm for probabilistic expert systems
- Dawid
- 1992
(Show Context)
Citation Context ...y edge . Proof: Necessity of the edge consistency is clear. The sufficiency can be established by an inductive argument in which successive nodes are stripped from the tree by local maximization; see =-=[15]-=-, [43] for further details. The max-marginal representation (36) allows the global problem of MAP estimation to be solved by performing a set of local maximization operations. In particular, suppose t... |

68 | Information geometry on a hierarchy of probability distributions
- Amari
(Show Context)
Citation Context ... distribution over cliques of the graph. We use the latter characterization here. For the sake of development in the sequel, it is convenient to describe this decomposition in exponential form (e.g., =-=[3]-=-). We begin with some necessary notation. A potential function associated with a given clique is mapping that depends only on the subcollection . There may be a family of potential functions associate... |

68 |
Roof duality, complementation and persistency in quadratic 0–1
- Hammer, Hansen, et al.
- 1984
(Show Context)
Citation Context ... Bethe approximation, as discussed in our previous work [44]. For binary variables, the linear program (27) can be shown to be equivalent to a relaxation that has been studied in previous work (e.g., =-=[27]-=-, [11]). The derivation given here illuminates the critical role of graphical structure in controlling the tightness of such a relaxation. In particular, an immediate consequence of our development is... |

52 | On the effective weights of pseudocodewords for codes defined on graphs with cycles
- FORNEY, KOETTER, et al.
- 2001
(Show Context)
Citation Context ...nite-length performance guarantees for particular codes and channels. In the context of decoding, the fractional vertices of the polytope have a very concrete interpretation as pseudocodewords (e.g., =-=[22]-=-, [25], [29], [45]). More broadly, it remains to further explore and analyze the range of problems for which the iterative algorithms and LP relaxations described here are suitable. VI. DISCUSSION In ... |

36 | Graph-covers and itérative decoding of finite length codes
- Koetter, Vontobel
- 2003
(Show Context)
Citation Context ...coding methods for turbo-like and low-density parity-check (LDPC) codes, and provided finite-length performance guarantees for particular codes and channels. In independent work, Koetter and Vontobel =-=[30]-=- used the notion of a graph cover to establish connections between the ordinary max-product algorithm for LDPC codes and a particular polytope equivalent to the one defining our LP relaxation. In othe... |

30 | Fractional belief propagation
- Wiegerinck, Heskes
- 2002
(Show Context)
Citation Context ...lish connections between the ordinary max-product algorithm for LDPC codes and a particular polytope equivalent to the one defining our LP relaxation. In other independent work, Wiegerinck and Heskes =-=[46]-=- have proposed a “fractional” form of the sum–product algorithm that is closely related to the tree-reweighted sum–product algorithm considered in our previous work [44]; see also Minka [36] for a rew... |

27 |
Understanding Belief Propagation and its Generalizations,” Mitsubishi
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ... min-sum algorithm is very closely connected to a particular linear programming relaxation of the MAP integer program. This variational characterization has links to the recent work of Yedidia et al. =-=[47]-=-, who showed that the sum–product algorithm has a variational interpretation involving the so-called Bethe free energy. In addition, the work described here is linked in spirit to our previous work [4... |

24 | Linear programming-based decoding of turbo-like codes and its relation to iterative approaches
- Feldman, Karger, et al.
(Show Context)
Citation Context ...e techniques described here. B. Notes and Related Developments We briefly summarize some developments related to the ideas described in this paper. In a parallel collaboration with Feldman and Karger =-=[19]-=-, [18], [20], we have studied the tree-relaxed LP and related message-passing algorithms as decoding methods for turbo-like and low-density parity-check (LDPC) codes, and provided finite-length perfor... |

24 |
Signal-space characterization of iterative decoding
- Frey, Koetter, et al.
(Show Context)
Citation Context ...ength performance guarantees for particular codes and channels. In the context of decoding, the fractional vertices of the polytope have a very concrete interpretation as pseudocodewords (e.g., [22], =-=[25]-=-, [29], [45]). More broadly, it remains to further explore and analyze the range of problems for which the iterative algorithms and LP relaxations described here are suitable. VI. DISCUSSION In this p... |

15 |
Iterative decoding and pseudocodewords
- Horn
- 1999
(Show Context)
Citation Context ...y cycle-free graphs. In recent years, the max-product algorithm has also been studied in application to graphs with cycles as a method for computing approximate MAP assignments (e.g., [1], [21]–[23], =-=[29]-=-, [43]). Although the method may perform well in practice, it is no longer guaranteed to output the correct MAP assignment, and it is straightforward to demonstrate problems on which it specifies an i... |

14 |
Geometry of cuts and metric embeddings
- Deza, Laurent
- 1997
(Show Context)
Citation Context ... have the constraint . The number of additional (nontrivial) linear constraints required to characterize , though always finite, grows rapidly in for a general graph with cycles; see Deza and Laurent =-=[16]-=- for discussion of the binary case. It is straightforward, however, to specify a subset of constraints that any must satisfy. First, as mentioned previously, since the elements of are marginal probabi... |

13 | Iterative min-sum decoding of tail-biting codes
- Aji, Horn, et al.
- 1998
(Show Context)
Citation Context ...ains to arbitrary cycle-free graphs. In recent years, the max-product algorithm has also been studied in application to graphs with cycles as a method for computing approximate MAP assignments (e.g., =-=[1]-=-, [21]–[23], [29], [43]). Although the method may perform well in practice, it is no longer guaranteed to output the correct MAP assignment, and it is straightforward to demonstrate problems on which ... |

12 | Multitarget-multisensor data association using the tree-reweighted max-product algorithm
- Chen, Wainwright, et al.
- 2003
(Show Context)
Citation Context ...terative algorithms described here are useful. For instance, we have applied the tree-reweighted max-product algorithm to a distributed data association problem involving multiple targets and sensors =-=[13]-=-. For the class of problem considered, the tree-reweighted max-product algorithm converges, typically quite rapidly, to a provably MAP-optimal data association. In other colloborative work, we have al... |

12 |
Exact inference using the attenuated max-product algorithm. In Advanced mean field methods: Theory and Practice
- Frey, Koetter
- 2000
(Show Context)
Citation Context ...onous manner, as in ordinary max-product [23], [43]. Moreover, we note that these tree-reweighted updates are related but distinct from the attenuated max-product updates proposed by Frey and Koetter =-=[24]-=-. A feature common to both algorithms is the reweighting of messages; however, unlike the tree-reweighted update (50), the attenuated max-product update in [24] of the message from to does not involve... |

11 | Tree consistency and bounds on the max-product algorithm and its generalizations. LIDS Tech. report, MIT; Available online at http://ssg.mit.edu/group/mjwain/mjwain.shtml - Wainwright, Jaakkola, et al. - 2002 |

9 |
Iterative decoding of tail-biting trellises and connections with symbolic dynamics
- Forney, Kschischang, et al.
- 2001
(Show Context)
Citation Context ...to arbitrary cycle-free graphs. In recent years, the max-product algorithm has also been studied in application to graphs with cycles as a method for computing approximate MAP assignments (e.g., [1], =-=[21]-=-–[23], [29], [43]). Although the method may perform well in practice, it is no longer guaranteed to output the correct MAP assignment, and it is straightforward to demonstrate problems on which it spe... |

8 |
Information and exponential families
- Barndorff-Nielson
- 1978
(Show Context)
Citation Context ...mials on of maximum degree two that respect the structure of . When the collection of potential functions do not satisfy any linear constraints, then the representation (1) is said to be minimal [3], =-=[4]-=-. For example, the Ising model (2) is minimal, because there is no linear combination of the potentials that is equal to a constant for all . In contrast, it is often convenient to consider an overcom... |

6 |
Upper bounds for quadratic 0-1 maximization
- Boros, Crama, et al.
- 1990
(Show Context)
Citation Context ... approximation, as discussed in our previous work [44]. For binary variables, the linear program (27) can be shown to be equivalent to a relaxation that has been studied in previous work (e.g., [27], =-=[11]-=-). The derivation given here illuminates the critical role of graphical structure in controlling the tightness of such a relaxation. In particular, an immediate consequence of our development is the f... |

6 |
Convergent tree-reweighted message-passing for energy minimization
- Kolmogorov
(Show Context)
Citation Context ...ted to the tree-reweighted sum–product algorithm considered in our previous work [44]; see also Minka [36] for a reweighted version of the expectation propagation algorithm. In other work, Kolmogorov =-=[31]-=-, [32] has studied the tree-reweighted max-product message-passing algorithms presented here, and proposed a sequential form of tree updates for which certain convergence guarantees can be established... |

4 |
Using linear programming to decode LDPC codes
- Feldman, Karger, et al.
- 2003
(Show Context)
Citation Context ...niques described here. B. Notes and Related Developments We briefly summarize some developments related to the ideas described in this paper. In a parallel collaboration with Feldman and Karger [19], =-=[18]-=-, [20], we have studied the tree-relaxed LP and related message-passing algorithms as decoding methods for turbo-like and low-density parity-check (LDPC) codes, and provided finite-length performance ... |

4 |
new class of upper bounds on the log partition function
- “A
- 2005
(Show Context)
Citation Context ...tion of the sum–product algorithm [47]; in fact, the LP relaxation itself can be obtained by taking a suitable limit of the “convexified” Bethe variational problem analyzed in our previous work [41], =-=[44]-=-. Second, we developed a family of tree-reweighted max-product algorithms that reparameterize a collection of tree-structured distributions in terms of a common set of pseudo-max-marginals on the node... |

3 |
On optimality properties of tree-reweighted message-passing
- Kolmogorov, Wainwright
- 2005
(Show Context)
Citation Context ...duct message-passing algorithms presented here, and proposed a sequential form of tree updates for which certain convergence guarantees can be established. In followup work, Kolmogorov and Wainwright =-=[33]-=- provided stronger optimality properties of tree-reweighted message passing when applied to problems with binary variables and pairwise interactions. C. Outline The remainder of this paper is organize... |

3 |
reparameterization framework for analysis of sumproduct and related algorithms
- “Tree-based
- 2003
(Show Context)
Citation Context ...es for each node 1 For brevity, we do not discuss hypertrees at length in this paper. Roughly speaking, they amount to trees formed on clusters of nodes from the original graph; see Wainwright et al. =-=[42]-=- for further details on hypertrees. (4) (5)s3700 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 11, NOVEMBER 2005 and for each edge Note that (5) and (6) define a -dimensional vector of margina... |